[ https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174373#comment-14174373 ]
Ewen Cheslack-Postava commented on KAFKA-1710: ---------------------------------------------- [~Bmis13] That approach just pushes the problem into KafkaAsyncProducer's thread that processes messages -- there won't be lock contention in KafkaProducer since KafkaAsyncProducer will be the only user of it, but you may not get an improvement in throughput because ultimately you're limited to the time a single thread can get. It may even get *slower* because you'll have more runnable threads at any given time, which means that the KafkaAsyncProducer worker thread will get less CPU time. Even disregarding that, since you used a LinkedBlockingQueue that will become your new source of contention (since it must be synchronized internally). If you have a very large capacity, that'll let the threads continue to make progress and contention will be lower since the time spent adding an item is very small, but it will cost a lot of memory since you're just adding a layer of buffering. That might be useful if you have bursty traffic (the buffer allows you to temporarily buffer more data while the KafkaProducer works on getting it sent), but if you have sustained traffic you'll just have constantly growing memory usage. If the capacity is small, then the threads producing messages will eventually end up getting blocked waiting for there to be space in the queue. Probably the biggest issue here is that this test only writes to a single partition in a single topic. You could improve performance by using more partitions in that topic. You're already writing to all producers from all threads, so you must not need the ordering guarantees of a single partition. If you still want a single partition, you can improve performance by using more Producers, which will spread the contention across more queues. Since you already have 4 that you're running round-robin on, I'd guess adding more shouldn't be a problem. In any case, this use case seems a bit odd. Are you really going to have 200 threads generating messages *as fast as they can* with only 4 producers? As far as this issue is concerned, the original report said the problem was deadlock but that doesn't seem to be the case. If you're just worried about performance, it probably makes more sense to move the discussion over to the mailing list. It'll probably be seen by more people and there will probably be multiple suggestions for improvements to your approach before we have to make changes to the Kafka code. > [New Java Producer Potential Deadlock] Producer Deadlock when all messages is > being sent to single partition > ------------------------------------------------------------------------------------------------------------ > > Key: KAFKA-1710 > URL: https://issues.apache.org/jira/browse/KAFKA-1710 > Project: Kafka > Issue Type: Bug > Components: producer > Environment: Development > Reporter: Bhavesh Mistry > Priority: Critical > Labels: performance > Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot > 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, > TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, > th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, > th6.dump, th7.dump, th8.dump, th9.dump > > > Hi Kafka Dev Team, > When I run the test to send message to single partition for 3 minutes or so > on, I have encounter deadlock (please see the screen attached) and thread > contention from YourKit profiling. > Use Case: > 1) Aggregating messages into same partition for metric counting. > 2) Replicate Old Producer behavior for sticking to partition for 3 minutes. > Here is output: > Frozen threads found (potential deadlock) > > It seems that the following threads have not changed their stack for more > than 10 seconds. > These threads are possibly (but not necessarily!) in a deadlock or hung. > > pool-1-thread-128 <--- Frozen for at least 2m > org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, > byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, > Callback) KafkaProducer.java:237 > org.kafka.test.TestNetworkDownProducer$MyProducer.run() > TestNetworkDownProducer.java:84 > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) > ThreadPoolExecutor.java:1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run() > ThreadPoolExecutor.java:615 > java.lang.Thread.run() Thread.java:744 > pool-1-thread-159 <--- Frozen for at least 2m 1 sec > org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, > byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, > Callback) KafkaProducer.java:237 > org.kafka.test.TestNetworkDownProducer$MyProducer.run() > TestNetworkDownProducer.java:84 > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) > ThreadPoolExecutor.java:1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run() > ThreadPoolExecutor.java:615 > java.lang.Thread.run() Thread.java:744 > pool-1-thread-55 <--- Frozen for at least 2m > org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition, > byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139 > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, > Callback) KafkaProducer.java:237 > org.kafka.test.TestNetworkDownProducer$MyProducer.run() > TestNetworkDownProducer.java:84 > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) > ThreadPoolExecutor.java:1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run() > ThreadPoolExecutor.java:615 > java.lang.Thread.run() Thread.java:744 > Thanks, > Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)