Hao Zhang created HELIX-742:
-------------------------------

             Summary: ZkHelixManager should consider session expire when 
detecting connection flapping
                 Key: HELIX-742
                 URL: https://issues.apache.org/jira/browse/HELIX-742
             Project: Apache Helix
          Issue Type: Task
            Reporter: Hao Zhang


In production we are seeing is because of infinite expiry-connect loop. These 
caused live instance change and trigger massive state transitions. As a result, 
controller overloads the ZK with thousands of  messages, and bring down the 
cluster.

 

Currently, when ZkHelixManager detects connection flapping, it only counts 
disconnects, but not session expiry, we need to take session expiry into 
consideration as well.

 

AC:
 * follow up this ticket with a plan to consolidate semantics and behavior
 * Code complete and test it out



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to