Yi Pan (Data Infrastructure) created SAMZA-1069:
---------------------------------------------------

             Summary: Deadlock between KafkaSystemProducer and KafkaProducer 
from kafka-clients lib
                 Key: SAMZA-1069
                 URL: https://issues.apache.org/jira/browse/SAMZA-1069
             Project: Samza
          Issue Type: Bug
            Reporter: Yi Pan (Data Infrastructure)


We have identified one deadlock scenario between the main thread that calls 
KafkaSystemProducer.close() vs the KafkaProducer client lib's network thread 
that calls the callback function within KafkaSystemProducer.send().

The scenario is the following:
# SamzaContainer main thread caught an exception from previous commit and 
container initiated shutdown, which calls 
{code}KafkaSystemProducer.stop(){code}, grabbing the synchronized 
{code}producerLock{code} in {code}KafkaSystemProducer{code} and call 
{code}KafkaProducer.flush(){code} to wait for all pending requests to be done.
# {code}KafkaProducer{code} network I/O thread then calls KafkaSystemProducer’s 
callback function (in {code}RecordBatch.done(){code}), which is waiting on the 
same {code}producerLock{code} in {code}KafkaSystemProducer{code} before it can 
return and call {code}producerFuture.done(){code} and release the 
{code}CountDownLatch{code} that the main thread 
{code}KafkaSystemProducer.close(){code} is waiting on. Hence, deadlock!

We need to make sure the KafkaSystemProducer.close() won't have race condition 
w/ the callbacks triggered by the KafkaProducer's network thread.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to