[jira] Commented: (ZOOKEEPER-335) zookeeper servers should commit the new leader txn to their logs.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934202#action_12934202 ] Flavio Junqueira commented on ZOOKEEPER-335: Radu, It sounds like the problem you mention has been resolved in ZOOKEEPER-790. I'm not sure which version you're using, but perhaps you should consider moving to 3.3.2. zookeeper servers should commit the new leader txn to their logs. - Key: ZOOKEEPER-335 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-335 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.1.0 Reporter: Mahadev konar Assignee: Mahadev konar Priority: Blocker Fix For: 3.4.0 Attachments: faultynode-vishal.txt, zk.log.gz, zklogs.tar.gz, ZOOKEEPER-790.travis.log.bz2 currently the zookeeper followers do not commit the new leader election. This will cause problems in a failure scenarios with a follower acking to the same leader txn id twice, which might be two different intermittent leaders and allowing them to propose two different txn's of the same zxid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933718#action_12933718 ] Flavio Junqueira commented on ZOOKEEPER-880: One problem here is that we had some discussions over IRC and the information is not reflected here. If you have a look at the logs, you'll observe this: {noformat} 2010-09-28 10:31:22,227 DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection request /10.10.20.5:41861 2010-09-28 10:31:22,227 DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection request: 0 2010-09-28 10:31:22,227 DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager: Address of remote peer: 0 2010-09-28 10:31:22,229 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:595) {noformat} If I remember the discussion with J-D correctly, that node trying to connect is running Nagios. My conjecture at the time was that the IOException was killing the receiver thread, but not the sender thread (RecvWorker.finish() does not close its SendWorker counterpart). Your point is good, but it sounds like that the race you mention would have to be triggered continuously to cause the number of SendWorker threads to grow steadily. It sounds unlikely to me. QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933719#action_12933719 ] Flavio Junqueira commented on ZOOKEEPER-934: One more comment. Looking at the logs for ZOOKEEPER-880, I remembered that in their case the RecvWorker thread was able to read a valid id from the connection with a Nagios server. I'm not exactly sure how that happened, but that essentially tells that the simple check you proposed might not do it. We don't want a Nagios box impersonating a ZooKeeper server! :-) Add sanity check for server ID -- Key: ZOOKEEPER-934 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 2. Should I add a check to reject connections from peers that are not listed in the configuration file? Currently, we are not doing any sanity check for server IDs. I think this might fix ZOOKEEPER-851. The fix is simple. However, I am not sure if anyone in community is relying on this ability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933928#action_12933928 ] Flavio Junqueira commented on ZOOKEEPER-880: I think we agree that monitoring alone was not causing the issue. But, your logs indicate that there were some orphan threads due to the monitoring, and we can see it from excerpts of your logs like the one I posted above. Without the monitoring, the same problem is being triggered, though, but apparently in a different way and it is not clear why. You can see it from all the Channel eof messages on the log. To solve this issue, we need to understand the following: # What's causing those IOExceptions? # Why are we even starting a new connection if there is no leader election going on? Do you folks have any idea if there is anything in your environment that could be causing those TCP connections to break? QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-933) Remove wildcard QuorumPeer.OBSERVER_ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933709#action_12933709 ] Flavio Junqueira commented on ZOOKEEPER-933: +1 for the idea, sounds right to me. Remove wildcard QuorumPeer.OBSERVER_ID --- Key: ZOOKEEPER-933 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-933 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 1. I have a question about the following piece of code in QCM: if (remoteSid == QuorumPeer.OBSERVER_ID) { /* * Choose identifier at random. We need a value to identify * the connection. */ remoteSid = observerCounter--; LOG.info(Setting arbitrary identifier to observer: + remoteSid); } Should we allow this? The problem with this code is that if a peer connects twice with QuorumPeer.OBSERVER_ID, we will end up creating threads for this peer twice. This could result in redundant SendWorker/RecvWorker threads. I haven't used observers yet. The documentation http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html says that just like followers, observers should have server IDs. In which case, why do we want to provide a wild-card? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933713#action_12933713 ] Flavio Junqueira commented on ZOOKEEPER-934: I was not thinking about OBSERVER_ID, good point, I think it should do it. Add sanity check for server ID -- Key: ZOOKEEPER-934 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 2. Should I add a check to reject connections from peers that are not listed in the configuration file? Currently, we are not doing any sanity check for server IDs. I think this might fix ZOOKEEPER-851. The fix is simple. However, I am not sure if anyone in community is relying on this ability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-918) Review of BookKeeper Documentation (Sequence flow and failure scenarios)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932902#action_12932902 ] Flavio Junqueira commented on ZOOKEEPER-918: Amit, just to give you an update, we have been discussing switching to a new documentation system soon (ZOOKEEPER-925), so we were wondering if it would be a problem waiting until we have it. Assuming the new system is easier to work with, we can more easily introduce your notes to the release documentation. Does it sound ok? If we take too long, then we can rethink it and find another way, like creating a wiki page or committing the pdf directly and linking to the BK documentation. Review of BookKeeper Documentation (Sequence flow and failure scenarios) Key: ZOOKEEPER-918 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-918 Project: Zookeeper Issue Type: Task Components: documentation Reporter: Amit Jaiswal Assignee: Amit Jaiswal Priority: Minor Fix For: 3.3.3, 3.4.0 Attachments: BookKeeperInternals.doc, BookKeeperInternals.pdf Original Estimate: 2h Remaining Estimate: 2h I have prepared a document describing some of the internals of bookkeeper in terms of: 1. Sequence of operations 2. Files layout 3. Failure scenarios The document is prepared by mostly by reading the code. Can somebody who understands the design review the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932909#action_12932909 ] Flavio Junqueira commented on ZOOKEEPER-922: HI Camille, Say a client disconnects from server A and reconnects to server B, same session. Server A believes the session should be expired because it received an exception. Server B believes the session should stay alive, since the client just reconnected. What should we do in this case? Kill the session or not? Our suggestion is to have an option that enables fast expiration and disables clients moving sessions to other servers. We are certainly not proposing to remove the second functionality from ZooKeeper altogether. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932974#action_12932974 ] Flavio Junqueira commented on ZOOKEEPER-900: +1, Great job, Vishal! On your question, the problem is that it is not easy to decide when a peer can close its connections because it doesn't know in which state others are and it might need to receive and respond to notifications. In any case, if have an idea for how to do it and want to discuss it further, we could create a new jira and work there, since this is a separate issue. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-900: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 1036071. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-902) Fix findbug issue in trunk Malicious code vulnerability
[ https://issues.apache.org/jira/browse/ZOOKEEPER-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932989#action_12932989 ] Flavio Junqueira commented on ZOOKEEPER-902: Agreed, I've seen that 900 didn't include it. I'd rather let Pat take care of wrapping up this issue... Fix findbug issue in trunk Malicious code vulnerability - Key: ZOOKEEPER-902 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-902 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.4.0 Reporter: Patrick Hunt Priority: Minor Fix For: 3.4.0 https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/970/artifact/trunk/findbugs/zookeeper-findbugs-report.html#Warnings_MALICIOUS_CODE Malicious code vulnerability Warnings Code Warning MSorg.apache.zookeeper.server.quorum.LeaderElection.epochGen isn't final but should be -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932999#action_12932999 ] Flavio Junqueira commented on ZOOKEEPER-900: Ok, there might have been a confusion. I've seen the patch available flag up and I interpreted it as ready to commit (after review, of course). If you still think there is work to be done on this jira, Vishal, please consider reopening it and creating sub-tasks. From your comments, I can extract at least 3 possible tasks. Once you create sub-tasks (or new independent jiras), I will comment on your questions. I'd rather do that so that we don't mix up the discussion. Is that ok? FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-933) Remove wildcard QuorumPeer.OBSERVER_ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933034#action_12933034 ] Flavio Junqueira commented on ZOOKEEPER-933: Hi Vishal, The reason for the wildcard is explained in ZOOKEEPER-599. I'd rather keep this feature for the reasons explained before, but it would be good to prevent the case you mention. Remove wildcard QuorumPeer.OBSERVER_ID --- Key: ZOOKEEPER-933 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-933 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 1. I have a question about the following piece of code in QCM: if (remoteSid == QuorumPeer.OBSERVER_ID) { /* * Choose identifier at random. We need a value to identify * the connection. */ remoteSid = observerCounter--; LOG.info(Setting arbitrary identifier to observer: + remoteSid); } Should we allow this? The problem with this code is that if a peer connects twice with QuorumPeer.OBSERVER_ID, we will end up creating threads for this peer twice. This could result in redundant SendWorker/RecvWorker threads. I haven't used observers yet. The documentation http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html says that just like followers, observers should have server IDs. In which case, why do we want to provide a wild-card? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID
[ https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933045#action_12933045 ] Flavio Junqueira commented on ZOOKEEPER-934: It sounds like we need to do it so that we don't get affected by port scanners or monitoring systems. However, I'm not sure if this impacts the observers feature we are discussing in the other jira (ZOOKEEPER-933). It sounds like it does, but I need to verify. Any thoughts? Add sanity check for server ID -- Key: ZOOKEEPER-934 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Fix For: 3.4.0 2. Should I add a check to reject connections from peers that are not listed in the configuration file? Currently, we are not doing any sanity check for server IDs. I think this might fix ZOOKEEPER-851. The fix is simple. However, I am not sure if anyone in community is relying on this ability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932500#action_12932500 ] Flavio Junqueira commented on ZOOKEEPER-922: Hi! I'm confused by this proposal. What happens if the client disconnects form one server and moves to another? Or you want to be able to disable that feature as well? enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932512#action_12932512 ] Flavio Junqueira commented on ZOOKEEPER-922: I think I understand your motivation, but I'm not sure it will work the way you expect it to work. I'm afraid that you might end end up getting lots of false positives due to delays introduced by the environment (e.g., jvm gc). Let me clarify one thing first: when you refer to clients crashing, are you thinking about the jvm crashing or the whole machine becoming unavailable? Basically my question is if you really expect connections to be cleanly closed or not. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932228#action_12932228 ] Flavio Junqueira commented on ZOOKEEPER-900: If we fix the findbugs issue here, then we should just close ZOOKEEPER-902 stating that it was resolved in ZOOKEEPER-900. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931353#action_12931353 ] Flavio Junqueira commented on ZOOKEEPER-900: Hi Vishal, This is a good question. I'm actually assuming that the behavior of TCP is such that if I send a message and then close the channel properly (calling close()), due to the reliability and order guarantees of the connection, the message will get through before the connection closes. Essentially, I'm relying upon the TCP ACK to do exactly what you're proposing. However, it might be a good idea to make sure that the assumption is correct or if you know the answer already, just let me know. Overall I do agree that having an ACK is important. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931460#action_12931460 ] Flavio Junqueira commented on ZOOKEEPER-900: That's a pretty strong statement. You're essentially suggesting that we shouldn't rely upon TCP to implement even its basic functionality. Also, my understanding is that Vishal is just reasoning about the code and he hasn't been able to reproduce that situation. Please correct me if I'm mistaken, Vishal. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931464#action_12931464 ] Flavio Junqueira commented on ZOOKEEPER-880: Benoit, just to clarify, is this also due to monitoring or scanning? QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931470#action_12931470 ] Flavio Junqueira commented on ZOOKEEPER-900: Sure, I can investigate a little further, and Vishal let us know if you find anything. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930953#action_12930953 ] Flavio Junqueira commented on ZOOKEEPER-928: Good point, Pat. I should have remembered this, since our hack to introduce the connection timeout in QCM previously was through the socket directly, so it makes sense that we would have to do the same for other blocking operations. In fact, I have quickly tried replacing the read call in receiveConnection with the following: {noformat} s.socket().getInputStream().read(msgBytes); {noformat} and I get a SocketTimeoutException after the especified timeout. Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930546#action_12930546 ] Flavio Junqueira commented on ZOOKEEPER-909: Thomas, Check the console output on hudson, close to the end of the page. The failure seems to be on the C tests. Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930774#action_12930774 ] Flavio Junqueira commented on ZOOKEEPER-928: I've just seen the messages on zookeeper-dev, and I'm not sure this is right: # readPacket is implemented in Learner.java, and the socket read is performed in this line: leaderIs.readRecord(pp, packet); # leaderIs is an InputArchive instance instantiated in Learner:connectToLeader; # The socket used to instantiate leaderIs has its SO_TIMEOUT value set right before in connectToLeader: sock.setSoTimeout(self.tickTime * self.initLimit). Consequently, the operation should not be delayed indefinitely and should return after self.tickTime * self.initLimit. This discussion on SO_TIMEOUT sounds familiar, huh? ;-) Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical Fix For: 3.3.3, 3.4.0 In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930788#action_12930788 ] Flavio Junqueira commented on ZOOKEEPER-928: Hi Vishal, My understanding is that the readRecord call in readPacket will timeout, even if the TCP connection is still up. The documentation in: http://download.oracle.com/javase/6/docs/api/java/net/SocketOptions.html says that: {noformat} static int SO_TIMEOUT Set a timeout on blocking Socket operations: {noformat} Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical Fix For: 3.3.3, 3.4.0 In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930800#action_12930800 ] Flavio Junqueira commented on ZOOKEEPER-928: My understanding is that SO_TIMEOUT also affects SocketChannel, since it builds on top of a Socket object. Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930851#action_12930851 ] Flavio Junqueira commented on ZOOKEEPER-928: The documentation refers to SocketInputStream.read(), but it doesn't mention SocketChannel.read(). I ran a quick test with QuorumCnxManager and it doesn't seem to work. So maybe it is true that setting SO_TIMEOUT has no effect on SocketChannel.read(), which is kind of surprising to me. Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Release ZooKeeper 3.3.2 (candidate 0)
+1, unit tests pass. Also ran a few manual tests. I must say that in one of the computers I tried, AsyncHammerTest fails, and the error message I get is that there are no tests. Discussing with Pat, we ended up concluding that it is most likely a configuration problem. I don't think that's a reason to -1 it, though.-FlavioOn Nov 11, 2010, at 12:24 AM, Henry Robinson wrote:+1Python looks good.On 10 November 2010 14:51, Michi Mutsuzaki mic...@yahoo-inc.com wrote:+1.I ran my benchmark test on the release candidate for one hour, and gotsimilar numbers as 3.3.0.--MichiOn 11/10/10 11:09 AM, "Mahadev Konar" maha...@yahoo-inc.com wrote:+1 for the release.Ran ant test and a couple of smoke tests. Create znodes and shutdownzookeeper servers to test durability. Deleted znodes to make sure theyaredeleted. Shot down servers one at a time to confirm correct behavior.ThanksmahadevOn 11/4/10 11:17 PM, "Patrick Hunt" ph...@apache.org wrote:I've created a candidate build for ZooKeeper 3.3.2. This is a bug fixrelease addressing twenty-six issues (eight critical) -- see therelease notes for details.*** Please download, test and VOTE before the*** vote closes 11pm pacific time, Tuesday, November 9.***http://people.apache.org/~phunt/zookeeper-3.3.2-candidate-0/Should we release this?Patrick-- Henry RobinsonSoftware EngineerCloudera415-994-6679 flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930201#action_12930201 ] Flavio Junqueira commented on ZOOKEEPER-925: I'm fine with moving to a different doc system and having our own lookfeel, but my main concern is having a doc generation that is relatively easy to use. If it is difficult to use, then contributors won't feel very motivated to write documentation... It would be great to get folks to stop whining when they have to write documentation, and stop blaming Forrest. :-) To be fair, I must say that my experience with Forrest hasn't been great. Having to insert tags by hand and not being able to find descriptions for tags easily made it hard for me to like Forrest. The output looks good for me, though. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930214#action_12930214 ] Flavio Junqueira commented on ZOOKEEPER-925: I was wondering if by getting away from checking in generated docs, you mean that anyone should be able to come and change docs freely. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever
[ https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929675#action_12929675 ] Flavio Junqueira commented on ZOOKEEPER-914: Hi Vishal, The Socket documentation does sound ambiguous, but my understanding is that SO_TIMEOUT is for blocking mode, not non-blocking mode. Non-blocking calls return immediately, so they shouldn't need a timeout value, no? Independent of using it or not, I would be curious to learn if my understanding is incorrect. About the release to include the fix, I think Mahdev later came and changed it to 3.3.3. It is fine with me, and we just need to check what the schedule for 3.3.3 is. My preference is to work directly on ZOOKEEPER-900 (or 901, which I think might be a more significant change), if you think we can produce a patch in time for 3.3.3. QuorumCnxManager blocks forever Key: ZOOKEEPER-914 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.3, 3.4.0 This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable [0x7fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked 0x7fa93315f988 (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929678#action_12929678 ] Flavio Junqueira commented on ZOOKEEPER-917: Hi Vishal, There is possibly a misunderstanding here. Server 2 reported in this jira (the leader) does not go back to an earlier epoch, but the other two do, and they are following, so if I understand your argument correctly, the exception is being applied as you suggest. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929988#action_12929988 ] Flavio Junqueira commented on ZOOKEEPER-925: Pat, Any thoughts on how it would be to port from Forrest to Maven site generation? Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever
[ https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929345#action_12929345 ] Flavio Junqueira commented on ZOOKEEPER-914: Hi Vishal, I also appreciate your contributions and your comments. I also understand your frustration when you find issues with the code, but think that it is possibly equally frustrating for the developer who thought that at least basic issues were covered, so please try to think that we don't introduce bugs on purpose (at least I don't) and our review process is not perfect. Regarding clover reports, we have agreed already that code coverage is not bulletproof, and in fact there has been several other metrics proposed in the scientific literature, but it does indicate that some call path including a give piece of code was exercised. It certainly doesn't measure more complex cases, like race conditions, crashes and so on. In fact, if you have a better way of measuring test coverage, I'd happy to hear about it. I'm not sure if you agree, but it seems to me that we should close this jira because the technical discussion here seems to be similar to the one of ZOOKEEPER-900. I'll try to address the concerns you raised regardless of what will happen to this jira: # My point about SO_TIMEOUT comes from here: http://download.oracle.com/javase/6/docs/api/java/net/Socket.html#setSoTimeout%28int%29 # I obviously prefer to go with real fixes instead of hacking, but we need to have release 3.3.2 out, and it sounded like introducing a configurable timeout would fix your problem until the next release; # About testing beyond the handshake, I'm not sure what you're proposing. If the blocking calls are part of the handshake and this is what is failing for you, then this is what we should target now, no? QuorumCnxManager blocks forever Key: ZOOKEEPER-914 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.3, 3.4.0 This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable [0x7fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked 0x7fa93315f988 (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929354#action_12929354 ] Flavio Junqueira commented on ZOOKEEPER-917: Hi Vishal, It is certainly understand not having dedicated development time being an issue. I actually didn't know you're interested in the cluster membership... I'm glad to hear, though. On your questions: # Suppose we have an ensemble comprising 3 servers: A, B, and C. Now suppose that C is the leader, and both A and B follow C. If A disconnects from C for whatever reason (e.g., network partition) and it tries to elect a leader, it won't get any other process in the LOOKING state. It will actually receive a notification from C saying that it is leading and one from B saying that it is following C, both with an earlier leader election epoch. To avoid having A locked out (not able to elect C as leader), we implemented this exception: a process accepts going back to an earlier leader election only if it receives a notification from the leader saying that it is leading and from a quorum saying that it is following; # I'm not sure if you referring to specific problem of this jira or if you are asking about my hypothetical example. Assuming it is the former, the follower (Follower:followLeader()) checks if the leader is proposing an earlier epoch, and if not, it accepts the leader snapshot. Because the epoch is the same, all followers will accept the leader snapshot follow it. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-918) Review of BookKeeper Documentation (Sequence flow and failure scenarios)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928566#action_12928566 ] Flavio Junqueira commented on ZOOKEEPER-918: This is really nice, Amit, thanks. I haven't had a chance to go carefully over the document, but my first reaction is that this should be a live document, and perhaps a wiki page would suit this purpose well. What do you think? Review of BookKeeper Documentation (Sequence flow and failure scenarios) Key: ZOOKEEPER-918 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-918 Project: Zookeeper Issue Type: Task Components: documentation Reporter: Amit Jaiswal Priority: Trivial Fix For: 3.3.3, 3.4.0 Attachments: BookKeeperInternals.pdf Original Estimate: 2h Remaining Estimate: 2h I have prepared a document describing some of the internals of bookkeeper in terms of: 1. Sequence of operations 2. Files layout 3. Failure scenarios The document is prepared by mostly by reading the code. Can somebody who understands the design review the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928171#action_12928171 ] Flavio Junqueira commented on ZOOKEEPER-917: The program I was using to open your logs was hiding some of the messages for some reason unknown to me. I now understand why the leader was elected in your case and the behavior is legitimate. Let me try to explain. We currently repeat the last notification sent to a given server upon reconnecting to it. This is to avoid problems with messages partially sent, and, assuming no further bugs, the protocol is resilient to messages duplicates. At the same time, a server A decides to follow another server B if it receives a message from B saying that B is leading and from a quorum saying that they are following, even if A is in a later election epoch. This mechanism is there to avoid A being locked out of the ensemble in the case it partitions away and comes back later. From you logs, what happens is: # Fresh server 2 receives previous notifications from 0 and 1, and decide to lead; # Server 1 receives the last message from server 0 saying that it is following 2 (which was the previous leader), and the notification from 2 saying that it is leading. Server 1 consequently decides to follow 2; # Server 0 receives the last message from server 1 saying that it is following 2 (which was the previous leader), and the notification from 2 saying that it is leading. Server 0 consequently decides to follow 2. Now the main problem I see is that the followers accept the snapshot from the leader, and they shouldn't given that they have moved to a later epoch. I suspect that we currently allow a server to come back to an epoch it has been in the past to again avoid having a server locked out after being partitioned away and healing, but I need to do some further inspection. My overall take is that your case is unfortunately not legitimate, meaning that we don't currently provision for configuration changes. The case you expose in general constitutes a loss of quorum, and that violates one of our core assumptions. In more detail, a quorum supporting a leader must have a non-empty intersection with the quorum of servers that have accepted requests in the previous epoch. Wiping out the state of server 2, by replacing it with a fresh server, leads to the situation in which just one server contains all transactions accepted by a quorum (and possibly committed). If you hadn't replaced server 2 with a fresh server, then either server 2 would have been elected again just the same, and it would be fine because it was previously the leader, or it wouldn't have been elected because the leader was previously another server and the last notifications of 0 and 1 would be supporting a different server. On reconfigurations, we have talked about it (http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership), but we haven't made enough progress recently and it is currently not implemented. It would be great to get some help here. Let me know if this analysis makes any sense to you, please. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928179#action_12928179 ] Flavio Junqueira commented on ZOOKEEPER-917: Hi Alexandre, It is an key premise of important replication algorithms, like Paxos, that there is a portion of the state that persists across crashes (and recoveries). By replacing server 2 with a fresh server, you simply got rid of the persistent state. In general, making that replacement you've made may lead you to trouble due to the problem I exposed a few postings up. Of course, if you wait for a successful election, the problem is supposed to go away because you have reestablished a quorum and this quorum does not contain the faulty server, but then you have to make sure the election happens before you introduce the fresh server perhaps through jmx or by inspecting the logs. Simply setting a reasonable timeout will work in most cases, but the leader election is not guaranteed to succeed, and there is a chance, likely to be small, that you'll end up with a corrupt state. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira resolved ZOOKEEPER-917. Resolution: Not A Problem My pleasure to help. I'm marking it as not a problem for now, but feel free to come back and ask for more clarification if needed. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927869#action_12927869 ] Flavio Junqueira commented on ZOOKEEPER-917: Hi Alexandre, Could you please post your configuration parameters? I noticed the following in both excerpts: {noformat} INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 2, LOOKING, LOOKING, 1 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, -1, 1, 2, LOOKING, LOOKING, 2 {noformat} which implies that both servers, 1 and 2, were starting from scratch and in an ensemble of 3 servers they form a quorum. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927908#action_12927908 ] Flavio Junqueira commented on ZOOKEEPER-917: I downloaded your logs, but the out files are empty and I couldn't find the notification messages. By looking at the excerpts you posted, it sounds like node 1 tells 0 that it is following 2 and node says that it is following (this is fine as node 2 might have received some old messages), so node 0 must follow 2. Now the question is why node 1 decided to follow 2, specially because it has a higher zxid and the follower code should have rejected an attempt to follow a leader from an earlier epoch. It would be nice to have a look at the output of node 1. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927909#action_12927909 ] Flavio Junqueira commented on ZOOKEEPER-882: If I understand correctly what you're proposing, I think it won't be necessary to submit two separate patches. To verify that the test fails without the patch, I can simply add the test without applying any other modification in the patch file, and then run the test. After applying the modifications to the code base, I'd be able to verify that the test does not fail any longer. Does it sound right to you? Startup loads last transaction from snapshot Key: ZOOKEEPER-882 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Minor Fix For: 3.4.0 Attachments: 882.diff, restore, ZOOKEEPER-882.patch On startup, the server first loads the latest snapshot, and then loads from the log starting at the last transaction in the snapshot. It should begin from one past that last transaction in the log. I will attach a possible patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927949#action_12927949 ] Flavio Junqueira commented on ZOOKEEPER-917: Even though the logs do not make a lot of sense for me at this point, I was thinking that your scenario is not supposed to work given our guarantees. Let's look at an example. Suppose we have 3 servers: A, B, and C. Suppose that C is initially the leader and proposes operations that B is able to ack, but A doesn't. Now, suppose that I come and replace C with a fresh server, same id but empty state, and I do it before A and B are able to elect a new leader and recover. In this case, A and C may form a quorum and the state of the ZooKeeper ensemble would be empty. The replacement of server C with a fresh server violates our assumptions. It should work, though, if you add a fresh server with a working ensemble. That is, you let A and B elect a new leader, and then you start the new C server. In your case, I'm still not sure why it happens because the initial zxid of node 1 is 4294967742 according to your excerpt. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-876) Unnecessary snapshot transfers between new leader and followers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-876: --- Status: Open (was: Patch Available) This is a nice catch, Diogo, and the patch looks good to me. I have a few very quick comments: # Instead of returning a pair of longs in startForwarding, we could simply return maxZxid and read lastProposed directly from the leader object. Doesn't it work? # The first comment of startForwarding is not saying much. Could you please expand it? # Could you please explain in the beginning of the test case what it is supposed to be testing? It is for later remembering what the test does. Good job! Unnecessary snapshot transfers between new leader and followers --- Key: ZOOKEEPER-876 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-876 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Diogo Assignee: Diogo Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-876.patch When starting a new leadership, unnecessary snapshot transfers happen between new leader and followers. This is so because of multiple small bugs. 1) the comparison of zxids is done based on a new proposal, instead of the last logged zxid. (LearnerFollower.java:310) 2) if follower is one zxid behind, the check of the interval of committed logs excludes the follower. (LearnerFollower.java:269) 3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-882) Startup loads last transaction from snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-882: --- Status: Open (was: Patch Available) Hi Jared, I was wondering if you can add a test case to your patch. Startup loads last transaction from snapshot Key: ZOOKEEPER-882 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Minor Fix For: 3.4.0 Attachments: 882.diff, restore, ZOOKEEPER-882.patch On startup, the server first loads the latest snapshot, and then loads from the log starting at the last transaction in the snapshot. It should begin from one past that last transaction in the log. I will attach a possible patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927657#action_12927657 ] Flavio Junqueira commented on ZOOKEEPER-702: Thanks, Abmar. It looks good to me. I have one quick comment, though. Is there any configuration value that could be causing tests to run slower? I have the impression that tests are running slightly slower with your patch. One in particular that called my attention was QuorumZxidSyncTest: {noformat} Trunk: [junit] Running org.apache.zookeeper.test.QuorumZxidSyncTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 94.55 sec 702: [junit] Running org.apache.zookeeper.test.QuorumZxidSyncTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 139.985 sec {noformat} and this seems to be pretty consistent. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Fix For: 3.4.0 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-914) QuorumCnxManager blocks forever
[ https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-914: --- Component/s: (was: server) (was: quorum) leaderElection QuorumCnxManager blocks forever Key: ZOOKEEPER-914 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.3, 3.4.0 This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable [0x7fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked 0x7fa93315f988 (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever
[ https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925746#action_12925746 ] Flavio Junqueira commented on ZOOKEEPER-914: As Pat, I would also appreciate some more constructive comments (and behavior). From the Clover reports, we exercise a significant part of the QCM code, but it is true, though, that we don't test the cases you have been exposing. Here is a way I believe we can reproduce this problem (I haven't implemented it, but seems to make sense). The high-level idea is to make sure that if some server stops responding before it completes the handshake protocol, then no instance of QCM across all servers will block and prevent other servers from joining the ensemble. Suppose we configure an ensemble with 5 servers using QuorumBase. One of the servers will be a simple mock server, as we do in the CnxManagerTest tests. Now here is the sequence of steps to follow: # Start three of the servers and confirm that they accept and execute operations; # Start mock server and execute the protocol partially. For the read case you mention, you can simply not send the server identifier. That will cause the read on the other end to block and to not accept more connections; # Start a 5th server and check if it is able to join the ensemble. A simple fix to have it working for you soon along the lines of what we have done to make the connection timeout configurable seems to be to set SO_TIMEOUT. But, if you have other ideas, please lay them out. Please bear in mind that the major modifications we should leave for ZOOKEEPER-901 because those will take more time to develop and get into shape. QuorumCnxManager blocks forever Key: ZOOKEEPER-914 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.3, 3.4.0 This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable [0x7fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked 0x7fa93315f988 (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925754#action_12925754 ] Flavio Junqueira commented on ZOOKEEPER-885: Sure, let's discuss over e-mail and we can post here later our findings. Zookeeper drops connections under moderate IO load -- Key: ZOOKEEPER-885 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.2, 3.3.1 Environment: Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper Reporter: Alexandre Hardy Priority: Critical Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-702: --- Status: Open (was: Patch Available) Hi Abmar, Thanks for the addition to the patch. I was wondering if it is really a good idea to have both options, normal and exponential, implemented. Since your experiments have shown that exponential performs better, why don't use it only? Also, I was wondering if you have posted expertimental numbers showing that exponential performs better. In the case we go with exponential only, then we don't need the modification to ivy.xml, right? And last comment, it doesn't look like the classes implementing PhiTimeoutEvaluator need to be public. Is this right? GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Fix For: 3.4.0 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Allowing a ZooKeeper server to be part of multiple clusters
That proposal came in the context of federated zookeeper, and the motivation at the time was to use multiple overlapping clusters to enable increasing write throughput as we increase the number of servers. To my knowledge, we haven't made any progress on the implementation of such a feature.I'd be curious to understand what scenario Vishal envision for such a 2-node cluster feature. If it is not federated, then we would have trouble with ZooKeeper because we rely upon one single leader to generate state updates. In the federated case, there is one leader (perhaps multiple during non-overlapping periods of time) for each partition.There is this wiki page I have written a while back: http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeperHope it helps.-FlavioOn Oct 25, 2010, at 11:24 PM, Vishal K wrote:Hi Mahadev,It lets one run multiple 2-node clusters. Suppose I have an application thatdoes a simple 2-way mirroring of my data and uses ZK for clustering. If Ineed to support many 2-node clusters, where will I find the spare machinesto run the third instance for each cluster?-VishalOn Mon, Oct 25, 2010 at 5:14 PM, Mahadev Konar maha...@yahoo-inc.comwrote:Hi Vishal, This idea (2.) had been kicked around intially by Flavio. I think he¹llprobably chip in on the discussion. I am just curious on the whats the ideabehind your proposal? Is this to provide some kind of failure gauranteesbetween a 2 node and 3 node cluster?ThanksmahadevOn 10/25/10 1:05 PM, "Vishal K" vishalm...@gmail.com wrote:Hi All,I am thinking about the choices one would have to support multiple 2-nodeclusters. Assume that for some reason one needs to support multiple2-nodeclusters.This would mean they will have to figure out a way to run a thirdinstanceof ZK server for each cluster somewhere to ensure that a ZK cluster isavailable after a failure.This works well if we have to run one or two 2-node clusters. However,whatif we have to run many 2-node clusters?I have following options:1. Find m machines to run the third instance of each cluster. Run n/minstances of ZK on each machine.2. Modify ZooKeeper server to participate in multiple clusters. This willallow us to run y instances of third node where each instance will bepartof n/y clusters.3. Run the third instance of ZK server required for the ith cluster ononeof the server on (i+1)%n cluster. Essentially, distribute the thirdinstanceacross the other clusters.The pros and cons of each approach are fairly obvious. While I prefer thethird approach, I would like to check what everyone thinks about thesecondapproach.Thanks.-Vishal flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301
Re: [VOTE] ZooKeeper as TLP?
+1On Oct 23, 2010, at 12:47 AM, Henry Robinson wrote:+1On 22 October 2010 14:53, Mahadev Konar maha...@yahoo-inc.com wrote:+1On 10/22/10 2:42 PM, "Patrick Hunt" ph...@apache.org wrote:Please vote as to whether you think ZooKeeper should become atop-level Apache project, as discussed previously on this list. I'veincluded below a draft board resolution.Do folks support sending this request on to the Hadoop PMC?Patrick X. Establish the Apache ZooKeeper Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to distributed system coordination for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache ZooKeeper Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is responsible for the creation and maintenance of software related to distributed system coordination; and be it further RESOLVED, that the office of "Vice President, Apache ZooKeeper" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache ZooKeeper Project, and to have primaryresponsibility for management of the projects within the scope of responsibility of the Apache ZooKeeper Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache ZooKeeper Project: * Patrick Hunt ph...@apache.org * Flavio Junqueira f...@apache.org * Mahadev Konar maha...@apache.org * Benjamin Reed br...@apache.org * Henry Robinson he...@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Patrick Hunt be appointed to the office of Vice President, Apache ZooKeeper, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache ZooKeeper Project; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is tasked with the migration and rationalization of the Apache Hadoop ZooKeeper sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Hadoop ZooKeeper sub-project encumbered upon the Apache Hadoop Project are hereafter discharged.-- Henry RobinsonSoftware EngineerCloudera415-994-6679 flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301
Re: Heisenbugs, Bohrbugs, Mandelbugs?
Thomas, Could you open jiras and make available the logs for tests that failed for you?Thanks,-FlavioOn Oct 22, 2010, at 7:56 PM, Thomas Koch wrote:Mahadev Konar:Hi Thomas, Could you verify this by just testing the trunk without your patch? Youmight very well be right that those tests are a little flaky.As for the hudson builds, Nigel is working on getting the patch builds forzookeeper running. As soon as that gets fixed this flaky tests would showup more often.ThanksmahadevOn 10/20/10 11:48 PM, "Thomas Koch" tho...@koch.ro wrote:Hi,last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk.One of this builds failed:junit.framework.AssertionFailedError: Leader hasn't joined: 5 at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)I did this many builds of trunk, because in my quest to redo the clientnetty integration step by step I made one step which resulted in 2failed builds out of 8. The two failures were both:Hi Mahadev,as I've written, I did 42 builds of trunk over the night from which 2 failed and 8 builds of my patch during work time with 2 failures. I also did another round of builds of my patch during last night and got only 1 failure out of ~40 succesful builds.So I believe that the high failure rate of 2/8 from the initial round of patch builds is because I did this builds over the day while other developers also used other virtual machines on the same host.Have a nice weekend,Thomas Koch, http://www.koch.ro flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301
Re: Restarting discussion on ZooKeeper as a TLP
+1 for moving forward, and I was wondering if you have an idea of when you'd have a draft of the proposal. It would be good to iterate over it perhaps.-FlavioOn Oct 20, 2010, at 7:50 PM, Patrick Hunt wrote:It's been a few days, any thoughts? Acceptable? I'd like to keep moving theball forward. Thanks.PatrickOn Sun, Oct 17, 2010 at 8:43 PM, 明珠刘 redis...@gmail.com wrote:+12010/10/14 Patrick Hunt ph...@apache.orgIn March of this year we discussed a request from the Apache Board, andHadoop PMC, that we become a TLP rather than a subproject of Hadoop:Original discussionhttp://markmail.org/thread/42cobkpzlgotcbinI originally voted against this move, my primary concern being that wewerenot "ready" to move to tlp status given our small contributor base andlimited contributor diversity. However I'd now like to revisit thatdiscussion/decision. Since that time the team has been working hard toattract new contributors, and we've seen significant new contributionscomein. There has also been feedback from board/pmc addressing many of theseconcerns (both on the list and in private). I am now less concerned aboutthis issue and don't see it as a blocker for us to move to TLP status.A second concern was that by becoming a TLP the project would lose it'sconnection with Hadoop, a big source of new users for us. I've beenassured(and you can see with the other projects that have moved to tlp status;pig/hive/hbase/etc...) that this connection will be maintained. TheHadoopZooKeeper tab for example will redirect to our new homepage.Other Apache members also pointed out to me that we are essentiallyoperating as a TLP within the Hadoop PMC. Most of the other PMC membershavelittle or no experience with ZooKeeper and this makes it difficult forthemto monitor and advise us. By moving to TLP status we'll be able to governourselves and better set our direction.I believe we are ready to become a TLP. Please respond to this email withyour thoughts and any issues. I will call a vote in a few days, oncediscussion settles.Regards,Patrick flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Status: Open (was: Patch Available) Missing a test. ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Attachment: ZOOKEEPER-893.patch Adding a test and removing an if statement that became unnecessary with this patch from RecvWorker.run(). I'll be adding a patch for the 3.3 branch shortly. ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893.patch, ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Attachment: ZOOKEEPER-893-3.3.patch Thanks, Thijs. Adding 3.3 patch. ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Status: Patch Available (was: Open) ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-901) Redesign of QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921997#action_12921997 ] Flavio Junqueira commented on ZOOKEEPER-901: It is a good point, Pat. It crossed my mind, but I thought it would be overkill to use netty. However, if it is simpler to have it for compatibility and uniformity purposes, then we should consider it. Redesign of QuorumCnxManager Key: ZOOKEEPER-901 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901 Project: Zookeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice
[ https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-881: --- Resolution: Fixed Status: Resolved (was: Patch Available) Ben forgot to close this issue. ZooKeeperServer.loadData loads database twice - Key: ZOOKEEPER-881 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-881.patch zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice
[ https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira resolved ZOOKEEPER-881. Resolution: Fixed Committed to the 3.3 branch (Committed revision 1023935.) ZooKeeperServer.loadData loads database twice - Key: ZOOKEEPER-881 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-881.patch zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-786) Exception in ZooKeeper.toString
[ https://issues.apache.org/jira/browse/ZOOKEEPER-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-786: --- Priority: Minor (was: Major) Fix Version/s: (was: 3.3.2) Exception in ZooKeeper.toString --- Key: ZOOKEEPER-786 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-786 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Mac OS X, x86 Reporter: Stephen Green Priority: Minor Fix For: 3.4.0 When trying to call ZooKeeper.toString during client disconnections, an exception can be generated: [04/06/10 15:39:57.744] ERROR Error while calling watcher java.lang.Error: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localAddress(Net.java:128) at sun.nio.ch.SocketChannelImpl.localAddress(SocketChannelImpl.java:430) at sun.nio.ch.SocketAdaptor.getLocalAddress(SocketAdaptor.java:147) at java.net.Socket.getLocalSocketAddress(Socket.java:717) at org.apache.zookeeper.ClientCnxn.getLocalSocketAddress(ClientCnxn.java:227) at org.apache.zookeeper.ClientCnxn.toString(ClientCnxn.java:183) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.zookeeper.ZooKeeper.toString(ZooKeeper.java:1486) at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677) at java.util.Formatter.format(Formatter.java:2433) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.echonest.cluster.ZooContainer.process(ZooContainer.java:544) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488) Caused by: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localInetAddress(Native Method) at sun.nio.ch.Net.localAddress(Net.java:125) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-786) Exception in ZooKeeper.toString
[ https://issues.apache.org/jira/browse/ZOOKEEPER-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922216#action_12922216 ] Flavio Junqueira commented on ZOOKEEPER-786: Since this seems to be a minor issue and to avoid further delays with 3.3.2, I propose we move it to 3.4.0. Exception in ZooKeeper.toString --- Key: ZOOKEEPER-786 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-786 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: Mac OS X, x86 Reporter: Stephen Green Fix For: 3.4.0 When trying to call ZooKeeper.toString during client disconnections, an exception can be generated: [04/06/10 15:39:57.744] ERROR Error while calling watcher java.lang.Error: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localAddress(Net.java:128) at sun.nio.ch.SocketChannelImpl.localAddress(SocketChannelImpl.java:430) at sun.nio.ch.SocketAdaptor.getLocalAddress(SocketAdaptor.java:147) at java.net.Socket.getLocalSocketAddress(Socket.java:717) at org.apache.zookeeper.ClientCnxn.getLocalSocketAddress(ClientCnxn.java:227) at org.apache.zookeeper.ClientCnxn.toString(ClientCnxn.java:183) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.zookeeper.ZooKeeper.toString(ZooKeeper.java:1486) at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677) at java.util.Formatter.format(Formatter.java:2433) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.echonest.cluster.ZooContainer.process(ZooContainer.java:544) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488) Caused by: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localInetAddress(Native Method) at sun.nio.ch.Net.localAddress(Net.java:125) ... 15 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922238#action_12922238 ] Flavio Junqueira commented on ZOOKEEPER-855: +1, I'll commit this in a minute. clientPortBindAddress should be clientPortAddress - Key: ZOOKEEPER-855 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.0, 3.3.1 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-855.patch The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-855: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks, Jared, I have just committed this: Branch 3.3: Committed revision 1024022. Trunk: Committed revision 1024029. clientPortBindAddress should be clientPortAddress - Key: ZOOKEEPER-855 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.0, 3.3.1 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-855: --- Attachment: ZOOKEEPER-855.patch I'm uploading the patch I committed. The original patch was modifying the html instead of the xml source. clientPortBindAddress should be clientPortAddress - Key: ZOOKEEPER-855 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.0, 3.3.1 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-901) Redesign of QuorumCnxManager
Redesign of QuorumCnxManager Key: ZOOKEEPER-901 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901 Project: Zookeeper Issue Type: Improvement Components: leaderElection Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921467#action_12921467 ] Flavio Junqueira commented on ZOOKEEPER-885: I'm not sure it is that simple, Dave. The problem is that pings do not require writes to disk, and in the scenario that Alexandre describes, there are only pings being processed. Why is the background I/O load affecting the processing of ZooKeeper? And in particular, why are session expiring as a consequence of this background I/O load? Zookeeper drops connections under moderate IO load -- Key: ZOOKEEPER-885 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.2, 3.3.1 Environment: Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper Reporter: Alexandre Hardy Priority: Critical Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921557#action_12921557 ] Flavio Junqueira commented on ZOOKEEPER-885: I've been running it and there is no traffic to the disk while the clients are watching. We generate a snapshot every snapCount, but given that there are no transactions generated, no transaction is appended to the log and no new snapshot is written. Zookeeper drops connections under moderate IO load -- Key: ZOOKEEPER-885 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.2, 3.3.1 Environment: Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper Reporter: Alexandre Hardy Priority: Critical Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Restarting discussion on ZooKeeper as a TLP
+1. Frankly, I don't see concretes benefits for the community with ZooKeeper becoming a TLP, but perhaps it will become clear over time. Now it is certainly cool to have our own top-level domain: http://zookeeper.apache.org/ rocks!-FlavioOn Oct 14, 2010, at 1:00 PM, Benjamin Reed wrote: +1benOn 10/14/2010 11:47 AM, Henry Robinson wrote:+1,I agree that we've addressed most outstanding concerns, we're ready forTLP.HenryOn 14 October 2010 13:29, Mahadev Konarmaha...@yahoo-inc.com wrote:+1 for moving to TLP.Thanks for starting the vote Pat.mahadevOn 10/13/10 2:10 PM, "Patrick Hunt"ph...@apache.org wrote:In March of this year we discussed a request from the Apache Board, andHadoop PMC, that we become a TLP rather than a subproject of Hadoop:Original discussionhttp://markmail.org/thread/42cobkpzlgotcbinI originally voted against this move, my primary concern being that wewerenot "ready" to move to tlp status given our small contributor base andlimited contributor diversity. However I'd now like to revisit thatdiscussion/decision. Since that time the team has been working hard toattract new contributors, and we've seen significant new contributionscomein. There has also been feedback from board/pmc addressing many of theseconcerns (both on the list and in private). I am now less concerned aboutthis issue and don't see it as a blocker for us to move to TLP status.A second concern was that by becoming a TLP the project would lose it'sconnection with Hadoop, a big source of new users for us. I've beenassured(and you can see with the other projects that have moved to tlp status;pig/hive/hbase/etc...) that this connection will be maintained. TheHadoopZooKeeper tab for example will redirect to our new homepage.Other Apache members also pointed out to me that we are essentiallyoperating as a TLP within the Hadoop PMC. Most of the other PMC membershavelittle or no experience with ZooKeeper and this makes it difficult forthemto monitor and advise us. By moving to TLP status we'll be able to governourselves and better set our direction.I believe we are ready to become a TLP. Please respond to this email withyour thoughts and any issues. I will call a vote in a few days, oncediscussion settles.Regards,Patrick flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301
[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921218#action_12921218 ] Flavio Junqueira commented on ZOOKEEPER-885: Hi Alexandre, When you load the machines running the zookeeper servers by running the dd command, how much time elapses between running dd and observing the connections expiring? I'm not being able to reproduce it, and I wonder how long the problem takes to manifest. Zookeeper drops connections under moderate IO load -- Key: ZOOKEEPER-885 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.2, 3.3.1 Environment: Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper Reporter: Alexandre Hardy Priority: Critical Attachments: tracezklogs.tar.gz, tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920713#action_12920713 ] Flavio Junqueira commented on ZOOKEEPER-885: I remember a while back fixing an issue with CommitProcessor, which was being killed by a runtime exception. As Pat pointed out, it does look like the pipeline is stalling, but it is still unclear why and I couldn't find anything that can indicate the cause. Let me try to reproduce it. Zookeeper drops connections under moderate IO load -- Key: ZOOKEEPER-885 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.2, 3.3.1 Environment: Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper Reporter: Alexandre Hardy Priority: Critical Attachments: tracezklogs.tar.gz, tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-884: --- Attachment: ZOOKEEPER-884.patch This is a very simple patch, and it fixes mostly documentation and comments. Given the pace that patches are making progress in ZooKeeper these days, I'll +1 it myself (at the risk of not having any value :-) ). Remove LedgerSequence references from BookKeeper documentation and comments in tests - Key: ZOOKEEPER-884 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Attachments: ZOOKEEPER-884.patch We no longer use LedgerSequence, so we need to remove references in documentation and comments sprinkled throughout the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-884: --- Status: Patch Available (was: Open) Remove LedgerSequence references from BookKeeper documentation and comments in tests - Key: ZOOKEEPER-884 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Attachments: ZOOKEEPER-884.patch We no longer use LedgerSequence, so we need to remove references in documentation and comments sprinkled throughout the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests
Remove LedgerSequence references from BookKeeper documentation and comments in tests - Key: ZOOKEEPER-884 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira We no longer use LedgerSequence, so we need to remove references in documentation and comments sprinkled throughout the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira reassigned ZOOKEEPER-884: -- Assignee: Flavio Junqueira Remove LedgerSequence references from BookKeeper documentation and comments in tests - Key: ZOOKEEPER-884 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira We no longer use LedgerSequence, so we need to remove references in documentation and comments sprinkled throughout the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916410#action_12916410 ] Flavio Junqueira commented on ZOOKEEPER-882: (I meant to post a comment yesterday, but jira decided to re-index right at the time) I like the way you structured the restore loop, it is simpler and easier to read, and I can't find any problem with it. About the severity of the bug, my interpretation is that it is harmless to re-execute the transaction, but still worth proposing a patch. Startup loads last transaction from snapshot Key: ZOOKEEPER-882 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Priority: Minor Attachments: 882.diff, restore On startup, the server first loads the latest snapshot, and then loads from the log starting at the last transaction in the snapshot. It should begin from one past that last transaction in the log. I will attach a possible patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-883) Idle cluster increasingly consumes CPU resources
[ https://issues.apache.org/jira/browse/ZOOKEEPER-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916446#action_12916446 ] Flavio Junqueira commented on ZOOKEEPER-883: I think this issue is related to ZOOKEEPER-880. It seems that the connections nagios creates start a RecvWorker and a SendWorker, and once they close, they kill RecvWorker but not SendWorker, so for every notification sent there is an orphan RecvWorker. I see two options: # Patch it so that it also kills the SendWorker instance; # Decline connection requests from unknown servers. I'm also curious to understand why you guys are monitoring the election port. Idle cluster increasingly consumes CPU resources Key: ZOOKEEPER-883 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-883 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Lars George Attachments: Archive.zip Monitoring the ZooKeeper nodes by polling the various ports using Nagios' open port checks seems to cause a substantial raise of CPU being used by the ZooKeeper daemons. Over the course of a week an idle cluster grew from a baseline 2% to 10% CPU usage. Attached is a stack dump and logs showing the occupied threads. At the end the daemon starts failing on too many open files errors as all handles are used up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-883) Idle cluster increasingly consumes CPU resources
[ https://issues.apache.org/jira/browse/ZOOKEEPER-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916476#action_12916476 ] Flavio Junqueira commented on ZOOKEEPER-883: I meant to say that there is an orphan SendWorker, not an orphan RecvWorker. Idle cluster increasingly consumes CPU resources Key: ZOOKEEPER-883 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-883 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Lars George Attachments: Archive.zip Monitoring the ZooKeeper nodes by polling the various ports using Nagios' open port checks seems to cause a substantial raise of CPU being used by the ZooKeeper daemons. Over the course of a week an idle cluster grew from a baseline 2% to 10% CPU usage. Attached is a stack dump and logs showing the occupied threads. At the end the daemon starts failing on too many open files errors as all handles are used up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916065#action_12916065 ] Flavio Junqueira commented on ZOOKEEPER-882: Hi Jared, Thanks for bringing this up. It doesn't look like that extra call to next() is necessary. If there is another file to process, then the call to next will return true and we will keep processing transactions, no? Startup loads last transaction from snapshot Key: ZOOKEEPER-882 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Priority: Minor Attachments: 882.diff On startup, the server first loads the latest snapshot, and then loads from the log starting at the last transaction in the snapshot. It should begin from one past that last transaction in the log. I will attach a possible patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916070#action_12916070 ] Flavio Junqueira commented on ZOOKEEPER-882: I'm also not clear on your second point. If you check FileTxnIterator.init(), then it seems to me that the zxid passed as a parameter should be included, so not dt.lastProcessedZxid+1. What am I missing? Startup loads last transaction from snapshot Key: ZOOKEEPER-882 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Priority: Minor Attachments: 882.diff On startup, the server first loads the latest snapshot, and then loads from the log starting at the last transaction in the snapshot. It should begin from one past that last transaction in the log. I will attach a possible patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916145#action_12916145 ] Flavio Junqueira commented on ZOOKEEPER-882: I agree with your description of the behavior of next, and sounds right to me that we should be setting hdr and calling return next(); at the end of the catch block. Regarding init(), we first use the value of zxid to determine which log files to read: all log files tagged with a value higher than zxid and the last log file that is less than zxid. Next we iterate over the log files until hdr.getZxid() is greater or equal to zxid (should be zxid really). This guarantees that the next call to next(), after init() returns, will return zxid+1. Does it sound right to you? Startup loads last transaction from snapshot Key: ZOOKEEPER-882 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882 Project: Zookeeper Issue Type: Bug Components: server Reporter: Jared Cantwell Priority: Minor Attachments: 882.diff On startup, the server first loads the latest snapshot, and then loads from the log starting at the last transaction in the snapshot. It should begin from one past that last transaction in the log. I will attach a possible patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915825#action_12915825 ] Flavio Junqueira commented on ZOOKEEPER-702: +1, I'm pretty happy with the patch. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Open (was: Patch Available) Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822-3.3.2.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Patch Available (was: Open) Thanks for the comments, Ben. I have modified zookeeperAdmin and added the zookeeper. prefix to the code. Regarding your question, initiateConnection is called from two methods: testInitiateConnection (used only in tests) and connectOne. connectOne is synchronized. Do you still see an issue? Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-702: --- Status: Open (was: Patch Available) I forgot to mention that the patch does not apply cleanly. I had to delete the first two lines (generated by eclipse), but once I did it applied cleanly. Abmar, could you upload a new patch? My +1 still holds... GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915625#action_12915625 ] Flavio Junqueira commented on ZOOKEEPER-880: J-D, Has it happened just once or it is reproducible? Does it also happen with 3.3? QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Open (was: Patch Available) Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822-3.3.2.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Patch Available (was: Open) Thanks for reviewing it, Vishal. I have fixed the LOG.warn you pointed out and uploaded new patch files. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915034#action_12915034 ] Flavio Junqueira commented on ZOOKEEPER-702: In the previous comment, hopefully it was clear that I meant to say that the new tests are NOT working as expected. Apologies for the typo. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-702: --- Status: Open (was: Patch Available) Thanks for the updated patch, Abmar. The new tests, however, are working as expected. More specifically, the methods in QuorumBase (createLearnersFD and createSessionsFD) are not being overridden as expected, which affects all new hammer tests. I haven't checked the other tests, but I suspect they suffer from the same problem. I'm canceling the patch for now. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914822#action_12914822 ] Flavio Junqueira commented on ZOOKEEPER-823: Here is another instance: {noformat} Testcase: testPathValidation took 1.865 sec Caused an ERROR KeeperErrorCode = ConnectionLoss for /chrootclienttest org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /chrootclienttest at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:640) at org.apache.zookeeper.test.ChrootClientTest.setUp(ChrootClientTest.java:42) {noformat} I'm on Mac OS X 1.5.8, java build 1.6.0_20-b02-279-9M3165. update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: NettyNettySuiteTest.rtf, TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913591#action_12913591 ] Flavio Junqueira commented on ZOOKEEPER-823: NettyNettySuiteTest is failing intermittently for me. I'm attaching logs for a run that failed. update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822-3.3.2.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Status: Open (was: Patch Available) Going to submit patches that introduce a system property. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-822: --- Attachment: ZOOKEEPER-822.patch Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.