[ 
https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625317#comment-15625317
 ] 

Json Tu edited comment on KAFKA-4360 at 11/1/16 12:38 PM:
----------------------------------------------------------

I put a pull request:https://github.com/apache/kafka/pull/2085,can someone 
review it?
[~becket_qin] [~junrao] [~guozhang]


was (Author: json tu):
I put a pull request:https://github.com/apache/kafka/pull/2085,can someone 
review it?

> Controller may deadLock when autoLeaderRebalance encounter zk expired
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-4360
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4360
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>            Reporter: Json Tu
>              Labels: bugfix
>         Attachments: deadlock_patch, yf-mafka2-common02_jstack.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> when controller has checkAndTriggerPartitionRebalance task in 
> autoRebalanceScheduler,and then zk expired at that time. It will
> run into deadlock.
> we can restore the scene as below,when zk session expired,zk thread will call 
> handleNewSession which defined in SessionExpirationListener, and it will get 
> controllerContext.controllerLock,and then it will 
> autoRebalanceScheduler.shutdown(),which need complete all the task in the 
> autoRebalanceScheduler,but that threadPoll also need get 
> controllerContext.controllerLock,but it has already owned by zk callback 
> thread,which will then run into deadlock.
> because of that,it will cause two problems at least, first is the broker’s id 
> is cannot register to the zookeeper,and it will be considered as dead by new 
> controller,second this procedure can not be stop by kafka-server-stop.sh, 
> because shutdown function
> can not get controllerContext.controllerLock also, we cannot shutdown kafka 
> except using kill -9.
> In my attachment, I upload a jstack file, which was created when my kafka 
> procedure cannot shutdown by kafka-server-stop.sh.
> I have met this scenes for several times,I think this may be a bug that not 
> solved in kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to