[jira] [Updated] (SAMZA-1069) Deadlock between KafkaSystemProducer and KafkaProducer from kafka-clients lib

Yi Pan (Data Infrastructure) (JIRA) Thu, 22 Dec 2016 15:39:31 -0800

     [ 
https://issues.apache.org/jira/browse/SAMZA-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yi Pan (Data Infrastructure) updated SAMZA-1069:
------------------------------------------------
    Description: 
We have identified one deadlock scenario between the main thread that calls 
KafkaSystemProducer.close() vs the KafkaProducer client lib's network thread 
that calls the callback function within KafkaSystemProducer.send().

The scenario is the following:
# SamzaContainer main thread caught an exception from previous commit and 
container initiated shutdown, which calls KafkaSystemProducer.stop(), grabbing 
the synchronized producerLock in KafkaSystemProducer and call 
KafkaProducer.flush() to wait for all pending requests to be done.
# KafkaProducer network I/O thread then calls KafkaSystemProducer’s callback 
function (in RecordBatch.done()), which is waiting on the same producerLock in 
KafkaSystemProducer before it can return and call producerFuture.done() and 
release the CountDownLatch that the main thread KafkaSystemProducer.close() is 
waiting on. Hence, deadlock!

We need to make sure the KafkaSystemProducer.close() won't have race condition 
w/ the callbacks triggered by the KafkaProducer's network thread.


  was:
We have identified one deadlock scenario between the main thread that calls 
KafkaSystemProducer.close() vs the KafkaProducer client lib's network thread 
that calls the callback function within KafkaSystemProducer.send().

The scenario is the following:
# SamzaContainer main thread caught an exception from previous commit and 
container initiated shutdown, which calls 
{code}KafkaSystemProducer.stop(){code}, grabbing the synchronized 
{code}producerLock{code} in {code}KafkaSystemProducer{code} and call 
{code}KafkaProducer.flush(){code} to wait for all pending requests to be done.
# {code}KafkaProducer{code} network I/O thread then calls KafkaSystemProducer’s 
callback function (in {code}RecordBatch.done(){code}), which is waiting on the 
same {code}producerLock{code} in {code}KafkaSystemProducer{code} before it can 
return and call {code}producerFuture.done(){code} and release the 
{code}CountDownLatch{code} that the main thread 
{code}KafkaSystemProducer.close(){code} is waiting on. Hence, deadlock!

We need to make sure the KafkaSystemProducer.close() won't have race condition 
w/ the callbacks triggered by the KafkaProducer's network thread.



> Deadlock between KafkaSystemProducer and KafkaProducer from kafka-clients lib
> -----------------------------------------------------------------------------
>
>                 Key: SAMZA-1069
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1069
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.11.0
>            Reporter: Yi Pan (Data Infrastructure)
>             Fix For: 0.12.0
>
>
> We have identified one deadlock scenario between the main thread that calls 
> KafkaSystemProducer.close() vs the KafkaProducer client lib's network thread 
> that calls the callback function within KafkaSystemProducer.send().
> The scenario is the following:
> # SamzaContainer main thread caught an exception from previous commit and 
> container initiated shutdown, which calls KafkaSystemProducer.stop(), 
> grabbing the synchronized producerLock in KafkaSystemProducer and call 
> KafkaProducer.flush() to wait for all pending requests to be done.
> # KafkaProducer network I/O thread then calls KafkaSystemProducer’s callback 
> function (in RecordBatch.done()), which is waiting on the same producerLock 
> in KafkaSystemProducer before it can return and call producerFuture.done() 
> and release the CountDownLatch that the main thread 
> KafkaSystemProducer.close() is waiting on. Hence, deadlock!
> We need to make sure the KafkaSystemProducer.close() won't have race 
> condition w/ the callbacks triggered by the KafkaProducer's network thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SAMZA-1069) Deadlock between KafkaSystemProducer and KafkaProducer from kafka-clients lib

Reply via email to