[ 
https://issues.apache.org/jira/browse/KAFKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174373#comment-14174373
 ] 

Ewen Cheslack-Postava commented on KAFKA-1710:
----------------------------------------------

[~Bmis13] That approach just pushes the problem into KafkaAsyncProducer's 
thread that processes messages -- there won't be lock contention in 
KafkaProducer since KafkaAsyncProducer will be the only user of it, but you may 
not get an improvement in throughput because ultimately you're limited to the 
time a single thread can get. It may even get *slower* because you'll have more 
runnable threads at any given time, which means that the KafkaAsyncProducer 
worker thread will get less CPU time. Even disregarding that, since you used a 
LinkedBlockingQueue that will become your new source of contention (since it 
must be synchronized internally). If you have a very large capacity, that'll 
let the threads continue to make progress and contention will be lower since 
the time spent adding an item is very small, but it will cost a lot of memory 
since you're just adding a layer of buffering. That might be useful if you have 
bursty traffic (the buffer allows you to temporarily buffer more data while the 
KafkaProducer works on getting it sent), but if you have sustained traffic 
you'll just have constantly growing memory usage. If the capacity is small, 
then the threads producing messages will eventually end up getting blocked 
waiting for there to be space in the queue.

Probably the biggest issue here is that this test only writes to a single 
partition in a single topic. You could improve performance by using more 
partitions in that topic. You're already writing to all producers from all 
threads, so you must not need the ordering guarantees of a single partition. If 
you still want a single partition, you can improve performance by using more 
Producers, which will spread the contention across more queues. Since you 
already have 4 that you're running round-robin on, I'd guess adding more 
shouldn't be a problem.

In any case, this use case seems a bit odd. Are you really going to have 200 
threads generating messages *as fast as they can* with only 4 producers?

As far as this issue is concerned, the original report said the problem was 
deadlock but that doesn't seem to be the case. If you're just worried about 
performance, it probably makes more sense to move the discussion over to the 
mailing list. It'll probably be seen by more people and there will probably be 
multiple suggestions for improvements to your approach before we have to make 
changes to the Kafka code.

> [New Java Producer Potential Deadlock] Producer Deadlock when all messages is 
> being sent to single partition
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1710
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1710
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>         Environment: Development
>            Reporter: Bhavesh Mistry
>            Priority: Critical
>              Labels: performance
>         Attachments: Screen Shot 2014-10-13 at 10.19.04 AM.png, Screen Shot 
> 2014-10-15 at 9.09.06 PM.png, Screen Shot 2014-10-15 at 9.14.15 PM.png, 
> TestNetworkDownProducer.java, th1.dump, th10.dump, th11.dump, th12.dump, 
> th13.dump, th14.dump, th15.dump, th2.dump, th3.dump, th4.dump, th5.dump, 
> th6.dump, th7.dump, th8.dump, th9.dump
>
>
> Hi Kafka Dev Team,
> When I run the test to send message to single partition for 3 minutes or so 
> on, I have encounter deadlock (please see the screen attached) and thread 
> contention from YourKit profiling.  
> Use Case:
> 1)  Aggregating messages into same partition for metric counting. 
> 2)  Replicate Old Producer behavior for sticking to partition for 3 minutes.
> Here is output:
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more 
> than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> pool-1-thread-128 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> pool-1-thread-55 <--- Frozen for at least 2m
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
>  byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord, 
> Callback) KafkaProducer.java:237
> org.kafka.test.TestNetworkDownProducer$MyProducer.run() 
> TestNetworkDownProducer.java:84
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 
> ThreadPoolExecutor.java:1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 
> ThreadPoolExecutor.java:615
> java.lang.Thread.run() Thread.java:744
> Thanks,
> Bhavesh 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to