[jira] [Created] (ZOOKEEPER-1465) Cluster availability following new leader election takes a long time with large datasets - is correlated to dataset size

Alex Gvozdenovic (JIRA) Thu, 10 May 2012 07:48:16 -0700

Alex Gvozdenovic created ZOOKEEPER-1465:
-------------------------------------------


             Summary: Cluster availability following new leader election takes 
a long time with large datasets - is correlated to dataset size
                 Key: ZOOKEEPER-1465
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1465
             Project: ZooKeeper
          Issue Type: Bug
          Components: leaderElection
    Affects Versions: 3.4.3
            Reporter: Alex Gvozdenovic
             Fix For: 3.4.4


When re-electing a new leader of a cluster, it takes a long time for the 
cluster to become available if the dataset is large

Test Data
----------
650mb snapshot size
20k nodes of varied size 
3 member cluster 

On 3.4.x branch 
(http://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4?r=1244779)
------------------------------------------------------------------------------------------

Takes 3-4 minutes to bring up a cluster from cold 
Takes 40-50 secs to recover from a leader failure 
Takes 10 secs for a new follower to join the cluster 

Using the 3.3.5 release on the same hardware with the same dataset
-----------------------------------------------------------------

Takes 10-20 secs to bring up a cluster from cold 
Takes 10 secs to recover from a leader failure 
Takes 10 secs for a new follower to join the cluster 

I can see from the logs in 3.4.x that once a new leader is elected, it pushes a 
new snapshot to each of the followers who need to save it before they ack the 
leader who can then mark the cluster as available. 

The kit being used is a low spec vm so the times taken are not relevant per se 
- more the fact that a snapshot is always sent even through there is no 
difference between the persisted state on each peer.
No data is being added to the cluster while the peers are being restarted.








--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (ZOOKEEPER-1465) Cluster availability following new leader election takes a long time with large datasets - is correlated to dataset size

Reply via email to