[ 
https://issues.apache.org/jira/browse/KAFKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696340#comment-15696340
 ] 

Json Tu edited comment on KAFKA-4447 at 11/26/16 1:16 AM:
----------------------------------------------------------

after check the email's response in the dev's mail list,I review the kafka's 
code again, I guess the reason may be as below.
1.as [~guozhang]'s saying, "unsubscribeChildChanges" on ZkClient and listener 
fired procedure are executed on different threads.
2.the zkclient's event thread which processing callbacks from zk server is a 
single thread. and it may be have many callbacks after controller's 
SessionExpirationListener's callback, such as 
ReassignedPartitionsIsrChangeListener, IsrChangeNotificationListener and so on.
3.so after we execute SessionExpirationListener's callback, though it 
deregister all listeners at the end. but we also need to run other callback's 
after this controller ressign.
4.so the controller's log of the attachment shows that it also acts as a 
controller, and it continued about 3 minutes.
5.I think the reason that leads to so long time is that we are doing partition 
reassignment, and my kafka cluster's enviroment is not so stable which leads 
some brokers expired from the zkserver,which trigger some callbacks that 
listened by controller.

can you give me some suggestions. [~guozhang] [~becket_qin]



was (Author: json tu):
after check the email's response in the dev's mail list,I review the kafka's 
code again, I guess the reason may be as below.
1.as [~guozhang]'s saying, "unsubscribeChildChanges" on ZkClient and listener 
fired procedure are executed on different threads.
2.the zkclient's event thread which processing callbacks from zk server is a 
single thread. and it may be have many callbacks after controller's 
SessionExpirationListener's callback, such as 
ReassignedPartitionsIsrChangeListener, IsrChangeNotificationListener and so on.
3.so after we execute SessionExpirationListener's callback, though it 
deregister all listener at the end. but we also need to run other callback's 
after this controller ressign.
4.so the controller's log of the attachment shows that it also acts as a 
controller, and it continued about 3 minutes.
5.I think the reason that leads to so long time is that my kafka cluster's 
enviroment's is not so stable,and it leads some brokers expired from the 
zkserver,which trigger some callback that listened by controller.

can you give me some suggestions. [~guozhang] [~becket_qin]


> Controller resigned but it also acts as a controller for a long time 
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-4447
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4447
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>         Environment: Linux Os
>            Reporter: Json Tu
>         Attachments: log.tar.gz
>
>
> We have a cluster with 10 nodes,and we execute following operation as below.
> 1.we execute some topic partition reassign from one node to other 9 nodes in 
> the cluster, and which triggered controller.
> 2.controller invoke PartitionsReassignedListener's handleDataChange and read 
> all partition reassign rules from the zk path, and executed all 
> onPartitionReassignment for all partition that match conditions.
> 3.but the controller is expired from zk, after what some nodes of 9 nodes 
> also expired from zk.
> 5.then controller invoke onControllerResignation to resigned as the 
> controller.
> we found after the controller is resigned, it acts as controller for about 3 
> minutes, which can be found in my attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to