Re: [Zookeeper-user] Leader election
The easiest way to fix this code is to move the Collections.sort(values) to right after the zk.getChildren() and then use the following Comparator with Collections.sort() and Collections.binarySearch(): /* This Comparator defines an ordering such that the strings with * the lowest sequence numbers are first in sequence sorted order * followed by strings without sequence numbers in lexographical * order. This class assumes that a '-' preceeds the sequence * number. */ class SequenceComparator implements ComparatorString { @Override public int compare(String o1, String o2) { int s1 = getSequence(o1); int s2 = getSequence(o2); if (s1 == -1 s2 == -1) { return o1.compareTo(o2); } return s1 == -1 ? 1 : s2 == -1 ? -1 : s1 - s2 ? : -1; } /* Returns the sequence suffix of s. This method assumes that * the sequence number is prefixed with a '-'. */ private int getSequence(String s) { int i = s.lastIndexOf('-'); if (i != -1) { try { return Integer.parseInt(s.substring(i+1)); // If an exception occurred we misdetected a sequence suffix, // so return -1. } catch(NumberFormatException e) { } catch(ArrayIndexOutOfBoundsException e) { } } return -1; } } ben On Thursday 10 July 2008 22:20:06 Avinash Lakshman wrote: Hi I am trying to elect leader among 50 nodes. There is always one odd guy who seems to think that someone else distinct from what some other nodes see as leader. Could someone please tell me what is wrong with the following code for leader election: public void electLeader() { ZooKeeper zk = StorageService.instance().getZooKeeperHandle(); String path = /Leader; try { String createPath = path + /L-; LeaderElector.createLock_.lock(); while( true ) { /* Get all znodes under the Leader znode */ ListString values = zk.getChildren(path, false); /* * Get the first znode and if it is the * pathCreated created above then the data * in that znode is the leader's identity. */ if ( leader_ == null ) { leader_ = new AtomicReferenceEndPoint( EndPoint.fromBytes( zk.getData(path + / + values.get(0), false, null) ) ); } else { leader_.set( EndPoint.fromBytes( zk.getData(path + / + values .get(0), false, null) ) ); /* Disseminate the state as to who the leader is. */ onLeaderElection(); } logger_.debug(Elected leader is + leader_ + @ znode + ( path + / + values.get(0) ) ); Collections.sort(values); /* We need only the last portion of this znode */ String[] peices = pathCreated_.split(/); int index = Collections.binarySearch(values, peices[peices.length - 1]); if ( index 0 ) { String pathToCheck = path + / + values.get(index - 1); Stat stat = zk.exists(pathToCheck, true); if ( stat != null ) { logger_.debug(Awaiting my turn ...); condition_.await(); logger_.debug(Checking to see if leader is around ...); } } else { break; } } } catch ( InterruptedException ex ) { logger_.warn(LogUtil.throwableToString(ex)); } catch ( KeeperException ex ) { logger_.warn(LogUtil.throwableToString(ex)); } finally { LeaderElector.createLock_.unlock(); } } } Thanks Avinash - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 ___ Zookeeper-user mailing list Zookeeper-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/zookeeper-user
Re: [Zookeeper-user] Leader election
I don't see leader election documented on the ZK recipies wiki: http://zookeeper.wiki.sourceforge.net/ZooKeeperRecipes Jacob, would you mind updating the wiki page, documenting this recipe? Thanks! Patrick Jacob Levy wrote: *Avinash* The following protocol will help you fix the observed misbehavior. As Flavio points out, you cannot rely on the order of nodes in getChildren, you must use an intrinsic property of each node to determine who is the leader. The protocol devised by Runping Qi and described here will do that. First of all, when you create child nodes of the node that holds the leadership bids, you must create them with the EPHEMERAL and SEQUENCE flag. ZooKeeper guarantees to give you an ephemeral node named uniquely and with a sequence number larger by at least one than any previously created node in the sequence. You provide a prefix, like “L_” or your own choice, and ZooKeeper creates nodes named “L_23”, “L_24”, etc. The sequence number starts at 0 and increases monotonously. Once you’ve placed your leadership bid, you search backwards from the sequence number of **your** node to see if there are any preceding (in terms of the sequence number) nodes. When you find one, you place a watch on it and wait for it to disappear. When you get the watch notification, you search again, until you do not find a preceding node, then you know you’re the leader. This protocol guarantees that there is at any time only one node that thinks it is the leader. But it does not disseminate information about who is the leader. If you want everyone to know who is the leader, you can have an additional Znode whose value is the name of the current leader (or some identifying information on how to contact the leader, etc.). Note that this cannot be done atomically, so by the time other nodes find out who the leader is, the leadership may already have passed on to a different node. *Flavio* Might it make sense to provide a standardized implementation of leader election in the library code in Java? --Jacob *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] *On Behalf Of *Flavio Junqueira *Sent:* Friday, July 11, 2008 1:02 AM *To:* zookeeper-user@lists.sourceforge.net *Cc:* [EMAIL PROTECTED] *Subject:* Re: [Zookeeper-user] Leader election Hi Avinash, getChildren returns a list in lexicographic order, so if you are updating the children of the election node concurrently, then you may get a different first node with different clients. If you are using the sequence flag to create nodes, then you may consider stripping the prefix of the node name and using the sufix value to determine order. Hope it helps. -Flavio - Original Message From: Avinash Lakshman [EMAIL PROTECTED] To: zookeeper-user@lists.sourceforge.net Sent: Friday, July 11, 2008 7:20:06 AM Subject: [Zookeeper-user] Leader election Hi I am trying to elect leader among 50 nodes. There is always one odd guy who seems to think that someone else distinct from what some other nodes see as leader. Could someone please tell me what is wrong with the following code for leader election: public void electLeader() { ZooKeeper zk = StorageService.instance().getZooKeeperHandle(); String path = /Leader; try { String createPath = path + /L-; LeaderElector.createLock_.lock(); while( true ) { /* Get all znodes under the Leader znode */ ListString values = zk.getChildren(path, false); /* * Get the first znode and if it is the * pathCreated created above then the data * in that znode is the leader's identity. */ if ( leader_ == null ) { leader_ = new AtomicReferenceEndPoint( EndPoint.fromBytes( zk.getData(path + / + values.get(0), false, null) ) ); } else { leader_.set( EndPoint.fromBytes( zk.getData(path + / + values .get(0), false, null) ) ); /* Disseminate the state as to who the leader is. */ onLeaderElection(); } logger_.debug(Elected leader is + leader_ + @ znode + ( path + / + values.get(0) ) ); Collections.sort(values); /* We need only the last portion of this znode */ String[] peices = pathCreated_.split(/); int index = Collections.binarySearch(values, peices
Re: [Zookeeper-user] Leader election
Hi Avinash, getChildren returns a list in lexicographic order, so if you are updating the children of the election node concurrently, then you may get a different first node with different clients. If you are using the sequence flag to create nodes, then you may consider stripping the prefix of the node name and using the sufix value to determine order. Hope it helps. -Flavio - Original Message From: Avinash Lakshman [EMAIL PROTECTED] To: zookeeper-user@lists.sourceforge.net Sent: Friday, July 11, 2008 7:20:06 AM Subject: [Zookeeper-user] Leader election Hi I am trying to elect leader among 50 nodes. There is always one odd guy who seems to think that someone else distinct from what some other nodes see as leader. Could someone please tell me what is wrong with the following code for leader election: public void electLeader() { ZooKeeper zk = StorageService.instance().getZooKeeperHandle(); String path = /Leader; try { String createPath = path + /L-; LeaderElector.createLock_.lock(); while( true ) { /* Get all znodes under the Leader znode */ ListString values = zk.getChildren(path, false); /* * Get the first znode and if it is the * pathCreated created above then the data * in that znode is the leader's identity. */ if ( leader_ == null ) { leader_ = new AtomicReferenceEndPoint( EndPoint.fromBytes( zk.getData(path + / + values.get(0), false, null) ) ); } else { leader_.set( EndPoint.fromBytes( zk.getData(path + / + values .get(0), false, null) ) ); /* Disseminate the state as to who the leader is. */ onLeaderElection(); } logger_.debug(Elected leader is + leader_ + @ znode + ( path + / + values.get(0) ) ); Collections.sort(values); /* We need only the last portion of this znode */ String[] peices = pathCreated_.split(/); int index = Collections.binarySearch(values, peices[peices.length - 1]); if ( index 0 ) { String pathToCheck = path + / + values.get(index - 1); Stat stat = zk.exists(pathToCheck, true); if ( stat != null ) { logger_.debug(Awaiting my turn ...); condition_.await(); logger_.debug(Checking to see if leader is around ...); } } else { break; } } } catch ( InterruptedException ex ) { logger_.warn(LogUtil.throwableToString(ex)); } catch ( KeeperException ex ) { logger_.warn(LogUtil.throwableToString(ex)); } finally { LeaderElector.createLock_.unlock(); } } } Thanks Avinash - Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08___ Zookeeper-user mailing list Zookeeper-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/zookeeper-user
Re: [Zookeeper-user] Leader Election
Good point. The recipe we show guarantees there will be a single leader elected, but only the leader knows it. Jacob Levy has been implementing a client library to do leader election, so he should really chime in here, but just in case he doesn't: I believe Jacob's solution was for the leader to create an ephemeral znode called LEADER with its id as the data when it becomes the leader, and then delete the node before relinquishing leadership. The other nodes then watch for the existence of the LEADER znode to see leadership changes. ben On Tuesday 17 June 2008 09:28:39 Avinash Lakshman wrote: Hi All I am trying to write a simple leader election module and I have 5 nodes A, B, C, D and E amongst which I need to elect a leader. Now I am following the example using SEQUENCE flags and trying to use the technique where the herd effect can be done away with. So I have A create a znode L-1, B create znode L-2 and E create znode L-5. After this I have L-2 watch L-1, L-3 watch L-2 etc. Let us assume A was elected leader. When A dies B should automatically become the leader and this seems to be working. What I need to know is how to C, D and E know about this? Do I need another mechanism to disseminate this information? I ask because not all znodes are being watched i.e C, D and E are not watching for L-1 which is the znode created by A. So how will they learn as to who the new leader is since no watch event will be triggered at their end. Thanks in advance Avinash - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Zookeeper-user mailing list Zookeeper-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/zookeeper-user
Re: [Zookeeper-user] Leader Election
Ben described the general outline of the protocol we implemented, which is an improvement on the recipe to avoid a herd effect every time that the leader changed. This improvement was actually suggested by Runping Qi of Yahoo!. The recipe protocol requires all clients to recomputed if they are now the leader, when the current leader either relinquishes leadership or disconnects. The improved protocol guarantees that only one client will need to recompute. Here's the algorithm: A persistent ZNode is used to be the parent of one or more ephemeral ZNodes. These ephemeral ZNodes represent the bids of different clients to become the leader. When a client wants to bid to become the leader, it creates an ephemeral sequence node and records the sequence number. Then, to compute if it is the leader, the client scans backwards from the sequence number it was assigned till 0, to find any preceding bids. If a preceding bid is found, the client places a watch on that ZNode, so it is informed when that ZNode is deleted. The deletion represents the owner client relinquishing leadership or disconnecting. When the watch event is received by a client, it scans backwards from its assigned sequence number to 0, to find a preceding bid. If none is found, then this client is now the leader. If a preceding bid is found, the client places a new watch on the ZNode, and waits again. Note that this protocol handles the situation when the current leader disconnects or abdicates, as well as the situation where a preceding-bid but non-leader client disconnects. In both cases, only one client gets a watch notification, so no herd effect is observed. Please ask if you need more details. This protocol will be part of the client library I'm implementing -- however do not get your hopes up too high, because at this time I do not know whether the library will be released outside of Yahoo!. --Jacob -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Benjamin Reed Sent: Tuesday, June 17, 2008 10:27 AM To: zookeeper-user@lists.sourceforge.net Subject: Re: [Zookeeper-user] Leader Election Good point. The recipe we show guarantees there will be a single leader elected, but only the leader knows it. Jacob Levy has been implementing a client library to do leader election, so he should really chime in here, but just in case he doesn't: I believe Jacob's solution was for the leader to create an ephemeral znode called LEADER with its id as the data when it becomes the leader, and then delete the node before relinquishing leadership. The other nodes then watch for the existence of the LEADER znode to see leadership changes. ben On Tuesday 17 June 2008 09:28:39 Avinash Lakshman wrote: Hi All I am trying to write a simple leader election module and I have 5 nodes A, B, C, D and E amongst which I need to elect a leader. Now I am following the example using SEQUENCE flags and trying to use the technique where the herd effect can be done away with. So I have A create a znode L-1, B create znode L-2 and E create znode L-5. After this I have L-2 watch L-1, L-3 watch L-2 etc. Let us assume A was elected leader. When A dies B should automatically become the leader and this seems to be working. What I need to know is how to C, D and E know about this? Do I need another mechanism to disseminate this information? I ask because not all znodes are being watched i.e C, D and E are not watching for L-1 which is the znode created by A. So how will they learn as to who the new leader is since no watch event will be triggered at their end. Thanks in advance Avinash - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Zookeeper-user mailing list Zookeeper-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/zookeeper-user - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ___ Zookeeper-user mailing list Zookeeper-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/zookeeper-user