[jira] [Commented] (KAFKA-749) Bug in socket server shutdown logic makes the broker hang on shutdown until it has to be killed

Jay Kreps (JIRA) Mon, 04 Feb 2013 21:56:16 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571071#comment-13571071
 ]


Jay Kreps commented on KAFKA-749:
---------------------------------

The ugly part here is the extra layer of synchronization and signally around 
the already synchronized blocking queue. This code is a bit hard to validate 
(for example shouldn't it be signal instead of signalAll--since only one thing 
was added?) so it tends to quickly get broken by later people who don't 
understand it.

I think I don't quite understand why we can't just call clear on the queue and 
enqueue the AllDone object to achieve this. The uglinesses of the previous 
implementation where that AllDone actually came out of the RequestChannel and 
that it was a ProducerRequest. This is easily fixed. There is no reason it 
should be a Producer request, and the check for eq AllDone can be done in 
receiveRequest.
                
> Bug in socket server shutdown logic makes the broker hang on shutdown until 
> it has to be killed
> -----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-749
>                 URL: https://issues.apache.org/jira/browse/KAFKA-749
>             Project: Kafka
>          Issue Type: Bug
>          Components: network
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: bugs, p1
>         Attachments: kafka-749-v1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The current shutdown logic of the server shuts down the io threads first, 
> followed by acceptor and finally processor threads. The shutdown API of io 
> threads enqueues a special AllDone command into the common request queue. It 
> shuts down the io thread when it dequeues this special all done command. What 
> can happen is that while this shutdown command processing is happening on the 
> io threads, the network/processor threads can still accept new connections 
> and requests and will add those new requests to the request queue. That 
> means, more requests can be enqueued after the AllDone command. What happens 
> is that after the io threads have shutdown, there is no thread available to 
> dequeue from the request queue. So the processor threads can hang while 
> adding new requests to a full request queue, thereby blocking the server from 
> shutting down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-749) Bug in socket server shutdown logic makes the broker hang on shutdown until it has to be killed

Reply via email to