In the new producer, a client can specify the partition number for each message. Then, any partitioning strategy can be implemented by the client.
Thanks, Jun On Thu, Aug 7, 2014 at 1:37 PM, Bhavesh Mistry <mistry.p.bhav...@gmail.com> wrote: > The root of problem is consumer lag on one or two partition even with no op > ( read log and discard it) consumer . Our use case is very simple. Send > all the log lines to Brokers. But under storm of data (due to exception or > application error etc), one or two partition gets lags behind while other > consumer are at 0 lag. We have tune the GC using the recommended GC > setting (according to > http://www.slideshare.net/ToddPalino/enterprise-kafka-kafka-as-a-service > tuning section ) In normal situation, this is ok. > > Hashing based on a key, and sticking to Murmur hash(key) % number of > partition did not give did not give a better throughput as compare to > random partitioning. It would be good to build intelligence about > producer selection based on rate of data for topic and/or lag. Is there > any way to customize stickiness interval for random partitioning strategy > ? > > sorry for late response. > > Thanks, > > Bhavesh > > > On Mon, Aug 4, 2014 at 6:50 PM, Joe Stein <joe.st...@stealth.ly> wrote: > > > Bhavesh, take a look at > > > > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified > > ? > > > > Maybe the root cause issue is something else? Even if producers produce > > more or less than what they are producing you should be able to make it > > random enough with a partitioner and a key. I don't think you should > need > > more than what is in the FAQ but incase so maybe look into > > http://en.wikipedia.org/wiki/MurmurHash as another hash option. > > > > /******************************************* > > Joe Stein > > Founder, Principal Consultant > > Big Data Open Source Security LLC > > http://www.stealth.ly > > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> > > ********************************************/ > > > > > > On Mon, Aug 4, 2014 at 9:12 PM, Bhavesh Mistry < > mistry.p.bhav...@gmail.com > > > > > wrote: > > > > > How to achieve uniform distribution of non-keyed messages per topic > > across > > > all partitions? > > > > > > We have tried to do this uniform distribution across partition using > > custom > > > partitioning from each producer instance using round robing ( > > > count(messages) % number of partition for topic). This strategy results > > in > > > very poor performance. So we have switched back to random stickiness > > that > > > Kafka provide out of box per some interval ( 10 minutes not sure > exactly > > ) > > > per topic. > > > > > > The above strategy results in consumer side lags sometime for some > > > partitions because we have some applications/producers producing more > > > messages for same topic than other servers. > > > > > > Can Kafka provide out of box uniform distribution by using coordination > > > among all producers and rely on measure rate such as # messages per > > minute > > > or # of bytes produce per minute to achieve uniform distribution and > > > coordinate stickiness of partition among hundreds of producers for same > > > topic ? > > > > > > Thanks, > > > > > > Bhavesh > > > > > >