KAFKA-4360 issue

Json Tu Mon, 31 Oct 2016 20:22:28 -0700
> 在 2016年11月1日，上午10:54，huxi (JIRA) <j...@apache.org> 写道：
> 
> 
>    [ 
> https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624155#comment-15624155
>  ] 
> 
> huxi commented on KAFKA-4360:
> -----------------------------
> 
> Excellent analysis! What I am intrigued is whether this is a deadlock issue 
> or a liveness issue. Here is my analysis:
> 1. Say at time T1, the zookeeper session expires, so 'handleNewSession' 
> methods for SessionExpirationListener is executed, therefore, obtaining the 
> controller lock(controllerContext.controllerLock)
> 2. Then it invokes 'onControllerResignation' method to have the current 
> controller quit, which will shutdown leader rebalance scheduler by calling 
> KafkaScheduler.shutdown
> 3. In 'shutdown' method, it shuts down the ScheduledThreadPoolExecutor and 
> blocks until all tasks have completed execution after a shutdown request
> 4. If there exists any tasks submitted before calling shutdown, the 
> check-imbalance thread should get started with checking isActive which 
> acquires the controller lock at the very beginning and then soon be blocked 
> due to the lock has already been held by the main thread.
> 5. In that case, the main thread will block in onControllerResignation method 
> until one day has elapsed by default or you just interrupt the check thread.
> 
> Does it make sense?
> 
> 
>> Controller may deadLock when autoLeaderRebalance encounter zk expired
>> ---------------------------------------------------------------------
>> 
>>                Key: KAFKA-4360
>>                URL: https://issues.apache.org/jira/browse/KAFKA-4360
>>            Project: Kafka
>>         Issue Type: Bug
>>         Components: controller
>>   Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>>           Reporter: Json Tu
>>             Labels: bugfix
>>        Attachments: yf-mafka2-common02_jstack.txt
>> 
>>  Original Estimate: 168h
>> Remaining Estimate: 168h
>> 
>> when controller has checkAndTriggerPartitionRebalance task in 
>> autoRebalanceScheduler，and then zk expired at that time. It will
>> run into deadlock.
>> we can restore the scene as below，when zk session expired，zk thread will 
>> call handleNewSession which defined in SessionExpirationListener, and it 
>> will get controllerContext.controllerLock，and then it will 
>> autoRebalanceScheduler.shutdown()，which need complete all the task in the 
>> autoRebalanceScheduler，but that threadPoll also need get 
>> controllerContext.controllerLock，but it has already owned by zk callback 
>> thread，which will then run into deadlock.
>> because of that，it will cause two problems at least, first is the broker’s 
>> id is cannot register to the zookeeper，and it will be considered as dead by 
>> new controller，second this procedure can not be stop by 
>> kafka-server-stop.sh, because shutdown function
>> can not get controllerContext.controllerLock also, we cannot shutdown kafka 
>> except using kill -9.
>> In my attachment, I upload a jstack file, which was created when my kafka 
>> procedure cannot shutdown by kafka-server-stop.sh.
>> I have met this scenes for several times，I think this may be a bug that not 
>> solved in kafka.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
KAFKA-4360 issue

Reply via email to