[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933075#action_12933075 ] Mahadev konar commented on ZOOKEEPER-366: - I am all for making it for 3.3.3. I'd be willing to fix it probably not this week. Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.3.3, 3.4.0 Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932498#action_12932498 ] Mahadev konar commented on ZOOKEEPER-905: - The patch looks good and applies to trunk. +1 to the patch, I am marking this as PA. enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Reporter: Nicholas Harteau Assignee: Nicholas Harteau Priority: Minor Fix For: 3.4.0 Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-905: Status: Patch Available (was: Open) enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Reporter: Nicholas Harteau Assignee: Nicholas Harteau Priority: Minor Fix For: 3.4.0 Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-756) some cleanup and improvements for zooinspector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932503#action_12932503 ] Mahadev konar commented on ZOOKEEPER-756: - @colin, Any updates on the patch? As patrick mentioned above, the patch doesnt apply to the trunk. some cleanup and improvements for zooinspector -- Key: ZOOKEEPER-756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-756 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.0 Reporter: Thomas Koch Assignee: Colin Goodheart-Smithe Fix For: 3.4.0 Attachments: zooInspectorChanges.patch Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as Veiwers. Please do a case insensitive search for veiw to correct these. Even the config file defaultNodeVeiwers.cfg is missspelled like this. This has the potential to confuse the hell out of people when debugging something! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-931) Documentation for Hedwig
[ https://issues.apache.org/jira/browse/ZOOKEEPER-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-931: Fix Version/s: 3.4.0 Documentation for Hedwig Key: ZOOKEEPER-931 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-931 Project: Zookeeper Issue Type: Task Components: contrib-hedwig Reporter: Erwin Tam Fix For: 3.4.0 This is a tracking jira for providing documentation to Hedwig on the Apache site. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930720#action_12930720 ] Mahadev konar commented on ZOOKEEPER-896: - this is interesting. Botond, can you explain you kerberos setup? Who generates the kerberos tokens? I am very interested in plugging in kerberos with zookeeper. Improve C client to support dynamic authentication schemes -- Key: ZOOKEEPER-896 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Botond Hejj Assignee: Botond Hejj Fix For: 3.4.0 Attachments: ZOOKEEPER-896.patch When we started exploring zookeeper for our requirements we found the authentication mechanism is not flexible enough. We want to use kerberos for authentication but using the current API we ran into a few problems. The idea is that we get a kerberos token on the client side and than send that token to the server with a kerberos scheme. A server side authentication plugin can use that token to authenticate the client and also use the token for authorization. We ran into two problems with this approach: 1. A different kerberos token is needed for each different server that client can connect to since kerberos uses mutual authentication. That means when the client acquires this kerberos token it has to know which server it connects to and generate the token according to that. The client currently can't generate a token for a specific server. The token stored in the auth_info is used for all the servers. 2. The kerberos token might have an expiry time so if the client loses the connection to the server and than it tries to reconnect it should acquire a new token. That is not possible currently since the token is stored in auth_info and reused for every connection. The problem can be solved if we allow the client to register a callback for authentication instead a static token. This can be a callback with an argument which passes the current host string. The zookeeper client code could call this callback before it sends the authentication info to the server to get a fresh server specific token. This would solve our problem with the kerberos authentication and also could be used for other more dynamic authentication schemes. The solution could be generalization also for the java client as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930779#action_12930779 ] Mahadev konar commented on ZOOKEEPER-928: - good point Flavio! I totally forgot about that. That should prevent this failure case. Vishal your thoughts? Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical Fix For: 3.3.3, 3.4.0 In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930785#action_12930785 ] Mahadev konar commented on ZOOKEEPER-928: - vishal, Here is the definition of setSoTimeout - {code} public void setSoTimeout(int timeout) throws SocketException Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds. With this option set to a non-zero timeout, a read() call on the InputStream associated with this Socket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the Socket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be 0. A timeout of zero is interpreted as an infinite timeout. {code} This means is that the read would block till timeout and throw an exception if it doesnt hear from the leader during that time. Wouldnt this suffice? Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical Fix For: 3.3.3, 3.4.0 In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar resolved ZOOKEEPER-928. - Resolution: Won't Fix Fix Version/s: (was: 3.3.3) (was: 3.4.0) No worries Vishal. Resolving the issue as wont fix. Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-926) Fork Hadoop common's test-patch.sh and modify for Zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-926: Fix Version/s: 3.4.0 Fork Hadoop common's test-patch.sh and modify for Zookeeper --- Key: ZOOKEEPER-926 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-926 Project: Zookeeper Issue Type: Improvement Components: build Reporter: Nigel Daley Fix For: 3.4.0 Attachments: ZOOKEEPER-926.patch Zookeeper currently uses the test-patch.sh script from the Hadoop nightly dir. This is now out of date. I propose we just copy the updated one in Hadoop common and then modify for ZK. This will also help as ZK moves out of Hadoop to it's own TLP. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-900: Fix Version/s: 3.4.0 FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Flavio Junqueira Priority: Critical Fix For: 3.4.0 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930225#action_12930225 ] Mahadev konar commented on ZOOKEEPER-900: - This is definitely a good step in cleaning up QuorumCnxnManager! I have updated the jira to mark it for 3.4 release. Vishal is that ok with you? Would you be able to provide a patch for the 3.4 release? FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-918) Review of BookKeeper Documentation (Sequence flow and failure scenarios)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-918: Fix Version/s: 3.4.0 3.3.3 Review of BookKeeper Documentation (Sequence flow and failure scenarios) Key: ZOOKEEPER-918 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-918 Project: Zookeeper Issue Type: Task Components: documentation Reporter: Amit Jaiswal Priority: Trivial Fix For: 3.3.3, 3.4.0 Attachments: BookKeeperInternals.pdf Original Estimate: 2h Remaining Estimate: 2h I have prepared a document describing some of the internals of bookkeeper in terms of: 1. Sequence of operations 2. Files layout 3. Failure scenarios The document is prepared by mostly by reading the code. Can somebody who understands the design review the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-917: Fix Version/s: 3.4.0 3.3.3 Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-915) Errors that happen during sync() processing at the leader do not get propagated back to the client.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-915: Fix Version/s: 3.4.0 3.3.3 Errors that happen during sync() processing at the leader do not get propagated back to the client. --- Key: ZOOKEEPER-915 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-915 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Fix For: 3.3.3, 3.4.0 If an error in sync() processing happens at the leader (SESSION_MOVED for example), they are not propagated back to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927020#action_12927020 ] Mahadev konar commented on ZOOKEEPER-907: - thats ok with me ben. Can you review and +1 the patch it its all good to go? Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-872) Small fixes to PurgeTxnLog
[ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-872: Status: Patch Available (was: Open) Small fixes to PurgeTxnLog --- Key: ZOOKEEPER-872 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-872 PurgeTxnLog forces us to have at least 2 backups (by having count = 3. Also, it prints to stdout instead of using Logger. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-850) Switch from log4j to slf4j
[ https://issues.apache.org/jira/browse/ZOOKEEPER-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-850: Fix Version/s: 3.4.0 Assignee: Olaf Krische Switch from log4j to slf4j -- Key: ZOOKEEPER-850 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-850 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Olaf Krische Assignee: Olaf Krische Fix For: 3.4.0 Attachments: ZOOKEEPER-3.3.1-log4j-slf4j-20101031.patch.bz2 Hello, i would like to see slf4j integrated into the zookeeper instead of relying explicitly on log4j. slf4j is an abstract logging framework. There are adapters from slf4j to many logger implementations, one of them is log4j. The decision which log engine to use i dont like to make so early. This would help me to embed zookeeper in my own applications (which use a different logger implemenation, but slf4j is the basis) What do you think? (as i can see, those slf4j request flood all other projects on apache as well :-) Maybe for 3.4 or 4.0? I can offer a patchset, i have experience in such an migration already. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-897) C Client seg faults during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925858#action_12925858 ] Mahadev konar commented on ZOOKEEPER-897: - jared, pat, I am ok without a test case for this one, because its a quite hard to create one. I just wanted someone else to run the tests on there machines just to verify (since I rarely see any problems in c tests on my machine). I will go ahead and commit this patch for now. C Client seg faults during close Key: ZOOKEEPER-897 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-897 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-897.diff, ZOOKEEPER-897.patch We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x0046234e in check_events (zh=0x6bd480, events=value optimized out) at src/zookeeper.c:1687 #2 0x00462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x00469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x77bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x76f706fd in clone () from /lib/libc.so.6 #6 0x in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(eventsZOOKEEPER_READ) branch executes 3. if (rc 0) branch executes 4. if (zh-input_buffer != zh-primer_buffer) branch executes .in the meantime.. 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end . back to check_events(). 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-897) C Client seg faults during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-897: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. thanks jared! C Client seg faults during close Key: ZOOKEEPER-897 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-897 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-897.diff, ZOOKEEPER-897.patch We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x0046234e in check_events (zh=0x6bd480, events=value optimized out) at src/zookeeper.c:1687 #2 0x00462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x00469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x77bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x76f706fd in clone () from /lib/libc.so.6 #6 0x in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(eventsZOOKEEPER_READ) branch executes 3. if (rc 0) branch executes 4. if (zh-input_buffer != zh-primer_buffer) branch executes .in the meantime.. 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end . back to check_events(). 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-898) C Client might not cleanup correctly during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925909#action_12925909 ] Mahadev konar commented on ZOOKEEPER-898: - +1, good catch jared. I just committed this to 3.3 and trunk. thanks! C Client might not cleanup correctly during close - Key: ZOOKEEPER-898 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-898 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-898.diff, ZOOKEEPER-898.patch I was looking through the c-client code and noticed a situation where a counter can be incorrectly incremented and a small memory leak can occur. In zookeeper.c : add_completion(), if close_requested is true, then the completion will not be queued. But at the end, outstanding_sync is still incremented and free() never called on the newly allocated completion_list_t. I will submit for review a diff that I believe corrects this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-898) C Client might not cleanup correctly during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-898: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) C Client might not cleanup correctly during close - Key: ZOOKEEPER-898 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-898 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-898.diff, ZOOKEEPER-898.patch I was looking through the c-client code and noticed a situation where a counter can be incorrectly incremented and a small memory leak can occur. In zookeeper.c : add_completion(), if close_requested is true, then the completion will not be queued. But at the end, outstanding_sync is still incremented and free() never called on the newly allocated completion_list_t. I will submit for review a diff that I believe corrects this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-517) NIO factory fails to close connections when the number of file handles run out.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-517: Fix Version/s: (was: 3.3.2) 3.3.3 moving this out to 3.3.3 and 3.4 for investigation. NIO factory fails to close connections when the number of file handles run out. --- Key: ZOOKEEPER-517 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-517 Project: Zookeeper Issue Type: Bug Components: server Reporter: Mahadev konar Assignee: Benjamin Reed Priority: Critical Fix For: 3.3.3, 3.4.0 The code in NIO factory is such that if we fail to accept a connection due to some reasons (too many file handles maybe one of them) we do not close the connections that are in CLOSE_WAIT. We need to call an explicit close on these sockets and then close them. One of the solutions might be to move doIO before accpet so that we can still close connection even if we cannot accept connections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926004#action_12926004 ] Mahadev konar commented on ZOOKEEPER-907: - ben, can is ZOOKEEPER-915 also marked for 3.3.2 then? Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-884: Fix Version/s: 3.4.0 Remove LedgerSequence references from BookKeeper documentation and comments in tests - Key: ZOOKEEPER-884 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-884.patch We no longer use LedgerSequence, so we need to remove references in documentation and comments sprinkled throughout the code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-907: Status: Patch Available (was: Open) Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-702: Fix Version/s: 3.4.0 GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Fix For: 3.4.0 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-904) super digest is not actually acting as a full superuser
[ https://issues.apache.org/jira/browse/ZOOKEEPER-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925087#action_12925087 ] Mahadev konar commented on ZOOKEEPER-904: - good catch. +1 for the patch. Ill run ant test and will commit to both 3.3.2 and 3.4. super digest is not actually acting as a full superuser --- Key: ZOOKEEPER-904 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-904-332.patch, ZOOKEEPER-904.patch The documentation states: New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a super user. In particular no ACL checking occurs for a user authenticated as super. However, if a super user does something like: zk.setACL(/, Ids.READ_ACL_UNSAFE, -1); the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the super authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-87) Follower does not shut itself down if its too far behind the leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925165#action_12925165 ] Mahadev konar commented on ZOOKEEPER-87: exactly vishal. The problem is to avoid clients being connected to a server which can lag behind the leader. Follower does not shut itself down if its too far behind the leader. Key: ZOOKEEPER-87 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-87 Project: Zookeeper Issue Type: Bug Components: quorum Reporter: Mahadev konar Assignee: Mahadev konar Priority: Critical Fix For: 3.4.0 Currently, the follower if lagging behind keeps sending pings to the leader it will stay alive and will keep getting further and further behind the leader. The follower should shut itself down if it is not able to keep up to the leader within some limit so that gurantee of updates can be made to the clients connected to different servers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-904) super digest is not actually acting as a full superuser
[ https://issues.apache.org/jira/browse/ZOOKEEPER-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925177#action_12925177 ] Mahadev konar commented on ZOOKEEPER-904: - Ran the tests, it passes. I just committed this to 3.3 and trunk. thanks Camille. super digest is not actually acting as a full superuser --- Key: ZOOKEEPER-904 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-904-332.patch, ZOOKEEPER-904.patch The documentation states: New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a super user. In particular no ACL checking occurs for a user authenticated as super. However, if a super user does something like: zk.setACL(/, Ids.READ_ACL_UNSAFE, -1); the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the super authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-904) super digest is not actually acting as a full superuser
[ https://issues.apache.org/jira/browse/ZOOKEEPER-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-904: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) super digest is not actually acting as a full superuser --- Key: ZOOKEEPER-904 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-904-332.patch, ZOOKEEPER-904.patch The documentation states: New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a super user. In particular no ACL checking occurs for a user authenticated as super. However, if a super user does something like: zk.setACL(/, Ids.READ_ACL_UNSAFE, -1); the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the super authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-805) four letter words fail with latest ubuntu nc.openbsd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924732#action_12924732 ] Mahadev konar commented on ZOOKEEPER-805: - sure, do you want to add some documentation to zookeeper admin guide to make it clearer on using -q and the issue with openbsd? four letter words fail with latest ubuntu nc.openbsd Key: ZOOKEEPER-805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-805 Project: Zookeeper Issue Type: Bug Components: documentation, server Affects Versions: 3.3.1, 3.4.0 Reporter: Patrick Hunt Priority: Critical Fix For: 3.3.2, 3.4.0 In both 3.3 branch and trunk echo stat|nc localhost 2181 fails against the ZK server on Ubuntu Lucid Lynx. I noticed this after upgrading to lucid lynx - which is now shipping openbsd nc as the default: OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2) vs nc traditional [v1.10-38] which works fine. Not sure if this is a bug in us or nc.openbsd, but it's currently not working for me. Ugh. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-897) C Client seg faults during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-897: Status: Open (was: Patch Available) jared, The patch that you provided leaks memory for the zookeeper client. We have to clean up the tosend and to process buffers on close and free them. Did you observe the problem with which release? I had tried to fix all the issues with zookeeper_close() in ZOOKEEPER-591. Also, michi has fixed a couple other issues in ZOOKEEPER-804. what version of code are you running? Also, can you provide some test case which causes this issue? (I know its hard to reproduce but even a test that reproduces it once in 10-20 times is good enough). C Client seg faults during close Key: ZOOKEEPER-897 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-897 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-897.diff, ZOOKEEPER-897.patch We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x0046234e in check_events (zh=0x6bd480, events=value optimized out) at src/zookeeper.c:1687 #2 0x00462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x00469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x77bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x76f706fd in clone () from /lib/libc.so.6 #6 0x in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(eventsZOOKEEPER_READ) branch executes 3. if (rc 0) branch executes 4. if (zh-input_buffer != zh-primer_buffer) branch executes .in the meantime.. 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end . back to check_events(). 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-903) Create a testing jar with useful classes from ZK test source
[ https://issues.apache.org/jira/browse/ZOOKEEPER-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-903: Fix Version/s: 3.4.0 Create a testing jar with useful classes from ZK test source Key: ZOOKEEPER-903 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-903 Project: Zookeeper Issue Type: Improvement Components: tests Reporter: Camille Fournier Fix For: 3.4.0 From mailing list: -Original Message- From: Benjamin Reed Sent: Monday, October 18, 2010 11:12 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Testing zookeeper outside the source distribution? we should be exposing those classes and releasing them as a testing jar. do you want to open up a jira to track this issue? ben On 10/18/2010 05:17 AM, Anthony Urso wrote: Anyone have any pointers on how to test against ZK outside of the source distribution? All the fun classes (e.g. ClientBase) do not make it into the ZK release jar. Right now I am manually running a ZK node for the unit tests to connect to prior to running my test, but I would rather have something that ant could reliably automate starting and stopping for CI. Thanks, Anthony -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-904) super digest is not actually acting as a full superuser
[ https://issues.apache.org/jira/browse/ZOOKEEPER-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-904: Fix Version/s: 3.4.0 super digest is not actually acting as a full superuser --- Key: ZOOKEEPER-904 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 The documentation states: New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a super user. In particular no ACL checking occurs for a user authenticated as super. However, if a super user does something like: zk.setACL(/, Ids.READ_ACL_UNSAFE, -1); the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the super authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-805) four letter words fail with latest ubuntu nc.openbsd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924021#action_12924021 ] Mahadev konar commented on ZOOKEEPER-805: - Pat, You think this should go into 3.3.2? four letter words fail with latest ubuntu nc.openbsd Key: ZOOKEEPER-805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-805 Project: Zookeeper Issue Type: Bug Components: documentation, server Affects Versions: 3.3.1, 3.4.0 Reporter: Patrick Hunt Priority: Critical Fix For: 3.3.2, 3.4.0 In both 3.3 branch and trunk echo stat|nc localhost 2181 fails against the ZK server on Ubuntu Lucid Lynx. I noticed this after upgrading to lucid lynx - which is now shipping openbsd nc as the default: OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2) vs nc traditional [v1.10-38] which works fine. Not sure if this is a bug in us or nc.openbsd, but it's currently not working for me. Ugh. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-800) zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-800: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 I just committed this. thanks michi. zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE --- Key: ZOOKEEPER-800 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-800 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.3.1 Reporter: Michi Mutsuzaki Assignee: Michi Mutsuzaki Priority: Minor Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-800.patch This happened when I called zoo_add_auth() immediately after zookeeper_init(). It took me a while to figure out that authentication actually failed since zoo_add_auth() returned ZOK. It should return ZINVALIDSTATE instead. --Michi -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921053#action_12921053 ] Mahadev konar commented on ZOOKEEPER-823: - I am +1 with creating a branch for this issue and keep working on it until we have all the issues figured out and then merge this back in. update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: NettyNettySuiteTest.rtf, TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, testDisconnectedAddAuth_FAILURE, testWatchAutoResetWithPending_FAILURE, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918112#action_12918112 ] Mahadev konar commented on ZOOKEEPER-804: - +1 for the patch. Pat can you please try it out and see it fixes the test cases? I will go ahead and commit if it passes for you. c unit tests failing due to assertion cptr failed --- Key: ZOOKEEPER-804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.4.0 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-804.patch I'm seeing this frequently: [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK [exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed. [exec] make: *** [run-check] Aborted [exec] Zookeeper_simpleSystem::testHangingClient Mahadev can you take a look? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918113#action_12918113 ] Mahadev konar commented on ZOOKEEPER-767: - really good to see this. Ill try and review the code as soon as possible. Submitting Demo/Recipe Shared / Exclusive Lock Code --- Key: ZOOKEEPER-767 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767 Project: Zookeeper Issue Type: Improvement Components: recipes Affects Versions: 3.3.0 Reporter: Sam Baskinger Assignee: Sam Baskinger Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, ZOOKEEPER-767.patch Time Spent: 8h Networked Insights would like to share-back some code for shared/exclusive locking that we are using in our labs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-820) update c unit tests to ensure zombie java server processes don't cause failure
[ https://issues.apache.org/jira/browse/ZOOKEEPER-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918144#action_12918144 ] Mahadev konar commented on ZOOKEEPER-820: - thsi does fix the if problem in the script (which took me sometime to realise was backwards :) ). Michi, can you also add the lsof check as follows: {code} if which lsof /dev/null 21; then run the command to kill the process using lsof; else we dont do anything; fi {code} Note that this is in addition to using pid checks. We can do the following - on stop server: - kill using pid file first - kill using lsof, if any such process if present. update c unit tests to ensure zombie java server processes don't cause failure Key: ZOOKEEPER-820 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-820 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-820-1.patch, ZOOKEEPER-820.patch, ZOOKEEPER-820.patch When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail. 1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup 2) before the tests are run we should see if the old server is still running and try to shut it down -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-804: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. thanks michi! c unit tests failing due to assertion cptr failed --- Key: ZOOKEEPER-804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.4.0 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-804.patch I'm seeing this frequently: [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK [exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed. [exec] make: *** [run-check] Aborted [exec] Zookeeper_simpleSystem::testHangingClient Mahadev can you take a look? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-820) update c unit tests to ensure zombie java server processes don't cause failure
[ https://issues.apache.org/jira/browse/ZOOKEEPER-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909306#action_12909306 ] Mahadev konar commented on ZOOKEEPER-820: - The only comment I have is that these scripts might not work on cygwin. Let me try and check how lsof works on cygwin windows. update c unit tests to ensure zombie java server processes don't cause failure Key: ZOOKEEPER-820 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-820 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Patrick Hunt Assignee: Michi Mutsuzaki Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-820-1.patch, ZOOKEEPER-820.patch When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail. 1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup 2) before the tests are run we should see if the old server is still running and try to shut it down -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-870) Zookeeper trunk build broken.
Zookeeper trunk build broken. - Key: ZOOKEEPER-870 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-870 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 the zookeeper current trunk build is broken mostly due to some netty changes. This is causing a huge backlog of PA's and other impediments to the review process. For now I plan to disable the test and fix them as part of 3.4 later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-871) ClientTest testClientCleanup is failing due to high fd count.
ClientTest testClientCleanup is failing due to high fd count. - Key: ZOOKEEPER-871 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-871 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Priority: Blocker Fix For: 3.4.0 The fd counts has increased. The tests are repeatedly failing on hudson machines. I probably think this is related to netty server changes. We have to fix this before we release 3.4 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-870) Zookeeper trunk build broken.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909506#action_12909506 ] Mahadev konar commented on ZOOKEEPER-870: - Testcase: testHammer took 81.115 sec FAILED node count not consistent expected:1771 but was:0 junit.framework.AssertionFailedError: node count not consistent expected:1771 but was:0 at org.apache.zookeeper.test.ClientBase.verifyRootOfAllServersMatch(ClientBase.java:581) at org.apache.zookeeper.test.AsyncHammerTest.testHammer(AsyncHammerTest.java:190) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:51) The test case junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 107.528 sec [junit] Test org.apache.zookeeper.test.NioNettySuiteHammerTest FAILED also fails. Zookeeper trunk build broken. - Key: ZOOKEEPER-870 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-870 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 the zookeeper current trunk build is broken mostly due to some netty changes. This is causing a huge backlog of PA's and other impediments to the review process. For now I plan to disable the test and fix them as part of 3.4 later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-871) ClientTest testClientCleanup is failing due to high fd count.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909519#action_12909519 ] Mahadev konar commented on ZOOKEEPER-871: - Also: {code} Testcase: testHammer took 81.115 sec FAILED node count not consistent expected:1771 but was:0 junit.framework.AssertionFailedError: node count not consistent expected:1771 but was:0 at org.apache.zookeeper.test.ClientBase.verifyRootOfAllServersMatch(ClientBase.java:581) at org.apache.zookeeper.test.AsyncHammerTest.testHammer(AsyncHammerTest.java:190) [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 107.528 sec [junit] Test org.apache.zookeeper.test.NioNettySuiteHammerTest FAILED {code} ClientTest testClientCleanup is failing due to high fd count. - Key: ZOOKEEPER-871 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-871 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Priority: Blocker Fix For: 3.4.0 The fd counts has increased. The tests are repeatedly failing on hudson machines. I probably think this is related to netty server changes. We have to fix this before we release 3.4 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-870) Zookeeper trunk build broken.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-870: Attachment: ZOOKEEPER-870.patch This patch ignores the following assertions for now: - the count of fds in ClienTest - the count of nodes in ClientBase These changes will be reverted back in ZOOKEEPER-871 before 3.4 is released. This is just a patch to get the patch process running so that review is done on time. Zookeeper trunk build broken. - Key: ZOOKEEPER-870 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-870 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Attachments: ZOOKEEPER-870.patch the zookeeper current trunk build is broken mostly due to some netty changes. This is causing a huge backlog of PA's and other impediments to the review process. For now I plan to disable the test and fix them as part of 3.4 later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-870) Zookeeper trunk build broken.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-870: Status: Patch Available (was: Open) Zookeeper trunk build broken. - Key: ZOOKEEPER-870 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-870 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Attachments: ZOOKEEPER-870.patch the zookeeper current trunk build is broken mostly due to some netty changes. This is causing a huge backlog of PA's and other impediments to the review process. For now I plan to disable the test and fix them as part of 3.4 later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-870) Zookeeper trunk build broken.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-870: Attachment: ZOOKEEPER-870.patch updated patch with comments incorporated. Zookeeper trunk build broken. - Key: ZOOKEEPER-870 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-870 Project: Zookeeper Issue Type: Bug Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Attachments: ZOOKEEPER-870.patch, ZOOKEEPER-870.patch the zookeeper current trunk build is broken mostly due to some netty changes. This is causing a huge backlog of PA's and other impediments to the review process. For now I plan to disable the test and fix them as part of 3.4 later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909558#action_12909558 ] Mahadev konar commented on ZOOKEEPER-822: - visha, flavio, If there is just one thread running at one point in time, then its ok. Also, I am really worried about the code structure in LeaderElection.java. Its ok to have a temporary fix, but it would be great to see some commitment from someone on doing it right in 3.4. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909592#action_12909592 ] Mahadev konar commented on ZOOKEEPER-822: - vishal, I was expecting some commitment from you for making it use a selector :). Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, ZOOKEEPER-822.patch_v1 Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-861) Missing the test SSL certificate used for running junit tests.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-861: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I just committed this. thanks erwin! Missing the test SSL certificate used for running junit tests. -- Key: ZOOKEEPER-861 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-861 Project: Zookeeper Issue Type: Bug Components: contrib-hedwig Reporter: Erwin Tam Assignee: Erwin Tam Priority: Minor Fix For: 3.4.0 Attachments: server.p12, ZOOKEEPER-861.patch The Hedwig code checked into Apache is missing a test SSL certificate file used for running the server junit tests. We need this file otherwise the tests that use this (e.g. TestHedwigHub) will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-866) Adding no disk persistence option in zookeeper.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906521#action_12906521 ] Mahadev konar commented on ZOOKEEPER-866: - Thomas, we have spent the last 3 years optimizing the throughput and latency of zookeeper. I think we have reach the point of minimal returns with this. I agree on the usability front you do have a point. But making it usable is orthogonal to what I propose over here. Both can take different directions. I am just trying to open a new area of usage for zookeeper. Does that make sense ? Adding no disk persistence option in zookeeper. --- Key: ZOOKEEPER-866 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-866 Project: Zookeeper Issue Type: New Feature Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Attachments: ZOOKEEPER-nodisk.patch Its been seen that some folks would like to use zookeeper for very fine grained locking. Also, in there use case they are fine with loosing all old zookeeper state if they reboot zookeeper or zookeeper goes down. The use case is more of a runtime locking wherein forgetting the state of locks is acceptable in case of a zookeeper reboot. Not logging to disk allows high throughput on and low latency on the writes to zookeeper. This would be a configuration option to set (ofcourse the default would be logging to disk). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-866) Adding no disk persistence option in zookeeper.
Adding no disk persistence option in zookeeper. --- Key: ZOOKEEPER-866 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-866 Project: Zookeeper Issue Type: New Feature Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Its been seen that some folks would like to use zookeeper for very fine grained locking. Also, in there use case they are fine with loosing all old zookeeper state if they reboot zookeeper or zookeeper goes down. The use case is more of a runtime locking wherein forgetting the state of locks is acceptable in case of a zookeeper reboot. Not logging to disk allows high throughput on and low latency on the writes to zookeeper. This would be a configuration option to set (ofcourse the default would be logging to disk). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-866) Adding no disk persistence option in zookeeper.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-866: Attachment: ZOOKEEPER-nodisk.patch Here is a patch that I had worked on. This is not a complete patch since this actually changes the code and doesnt make it configurable option. I will be creating another patch wherein there is a configuration option for this. Feedback is welcome. Adding no disk persistence option in zookeeper. --- Key: ZOOKEEPER-866 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-866 Project: Zookeeper Issue Type: New Feature Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 3.4.0 Attachments: ZOOKEEPER-nodisk.patch Its been seen that some folks would like to use zookeeper for very fine grained locking. Also, in there use case they are fine with loosing all old zookeeper state if they reboot zookeeper or zookeeper goes down. The use case is more of a runtime locking wherein forgetting the state of locks is acceptable in case of a zookeeper reboot. Not logging to disk allows high throughput on and low latency on the writes to zookeeper. This would be a configuration option to set (ofcourse the default would be logging to disk). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-864) Hedwig C++ client improvements
[ https://issues.apache.org/jira/browse/ZOOKEEPER-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906102#action_12906102 ] Mahadev konar commented on ZOOKEEPER-864: - michi to answer your question, all we need is a careful review. Hedwig C++ client improvements -- Key: ZOOKEEPER-864 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-864 Project: Zookeeper Issue Type: Improvement Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.4.0 Attachments: ZOOKEEPER-864.diff I changed the socket code to use boost asio. Now the client only creates one thread, and all operations are non-blocking. Tests are now automated, just run make check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-822: Assignee: Vishal K Fix Version/s: 3.3.2 3.4.0 Marking this for 3.3.2, to see if we want this included in 3.3.2. Leader election taking a long time to complete --- Key: ZOOKEEPER-822 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.0 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for START HERE. Logs after that point should be of our interest. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-860) Add alternative search-provider to ZK site
[ https://issues.apache.org/jira/browse/ZOOKEEPER-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906223#action_12906223 ] Mahadev konar commented on ZOOKEEPER-860: - alex, the assignment just means that you are working on the patch currently. A committer will review and provide you feedback or commit if deemed fit for the project. Hope that helps. Add alternative search-provider to ZK site -- Key: ZOOKEEPER-860 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-860 Project: Zookeeper Issue Type: Improvement Components: documentation Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-860.patch Use search-hadoop.com service to make available search in ZK sources, MLs, wiki, etc. This was initially proposed on user mailing list (http://search-hadoop.com/m/sTZ4Y1BVKWg1). The search service was already added in site's skin (common for all Hadoop related projects) before (as a part of [AVRO-626|https://issues.apache.org/jira/browse/AVRO-626]) so this issue is about enabling it for ZK. The ultimate goal is to use it at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-860) Add alternative search-provider to ZK site
[ https://issues.apache.org/jira/browse/ZOOKEEPER-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-860: Fix Version/s: 3.4.0 marking it for 3.4 for keeping track. Add alternative search-provider to ZK site -- Key: ZOOKEEPER-860 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-860 Project: Zookeeper Issue Type: Improvement Components: documentation Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-860.patch Use search-hadoop.com service to make available search in ZK sources, MLs, wiki, etc. This was initially proposed on user mailing list (http://search-hadoop.com/m/sTZ4Y1BVKWg1). The search service was already added in site's skin (common for all Hadoop related projects) before (as a part of [AVRO-626|https://issues.apache.org/jira/browse/AVRO-626]) so this issue is about enabling it for ZK. The ultimate goal is to use it at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-856) Connection imbalance leads to overloaded ZK instances
[ https://issues.apache.org/jira/browse/ZOOKEEPER-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar reassigned ZOOKEEPER-856: --- Assignee: Mahadev konar Connection imbalance leads to overloaded ZK instances - Key: ZOOKEEPER-856 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-856 Project: Zookeeper Issue Type: Bug Reporter: Travis Crawford Assignee: Mahadev konar Fix For: 3.4.0 Attachments: zk_open_file_descriptor_count_members.gif, zk_open_file_descriptor_count_total.gif We've experienced a number of issues lately where ruok requests would take upwards of 10 seconds to return, and ZooKeeper instances were extremely sluggish. The sluggish instance requires a restart to make it responsive again. I believe the issue is connections are very imbalanced, leading to certain instances having many thousands of connections, while other instances are largely idle. A potential solution is periodically disconnecting/reconnecting to balance connections over time; this seems fine because sessions should not be affected, and therefore ephemaral nodes and watches should not be affected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-854) BookKeeper does not compile due to changes in the ZooKeeper code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903838#action_12903838 ] Mahadev konar commented on ZOOKEEPER-854: - +1 the patch looks good! BookKeeper does not compile due to changes in the ZooKeeper code Key: ZOOKEEPER-854 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-854 Project: Zookeeper Issue Type: Bug Components: contrib-bookkeeper Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Fix For: 3.4.0 Attachments: ZOOKEEPER-854.patch, ZOOKEEPER-854.patch BookKeeper does not compile due to changes in the NIOServerCnxn class of ZooKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-846) zookeeper client doesn't shut down cleanly on the close call
[ https://issues.apache.org/jira/browse/ZOOKEEPER-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-846: Fix Version/s: 3.3.2 We should fix this in 3.3.2 if this still exists. zookeeper client doesn't shut down cleanly on the close call Key: ZOOKEEPER-846 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-846 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.2 Reporter: Ted Yu Fix For: 3.3.2 Attachments: rs-13.stack Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to d...@hbase.apache.org but failed: DestroyJavaVM prio=10 tid=0x2aabb849c800 nid=0x6c60 waiting on condition [0x] java.lang.Thread.State: RUNNABLE regionserver/10.32.42.245:60020 prio=10 tid=0x2aabb84ce000 nid=0x6c81 in Object.wait() [0x43755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked 0x2aaab76633c0 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked 0x2aaabf5e0c30 (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) main-EventThread daemon prio=10 tid=0x43474000 nid=0x6c80 waiting on condition [0x413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaabf6e9150 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-856) Connection imbalance leads to overloaded ZK instances
[ https://issues.apache.org/jira/browse/ZOOKEEPER-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12903017#action_12903017 ] Mahadev konar commented on ZOOKEEPER-856: - travis, we have had a lot of discussion on load balancing. I'd really want to try and see how the disconnect and reconnect works for load balancing. I am also with you that it might be a good enough soln on load balancing. I can upload a simple patch for this. Would you have some bandwidth trying and it out and reporting how well it works? Connection imbalance leads to overloaded ZK instances - Key: ZOOKEEPER-856 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-856 Project: Zookeeper Issue Type: Bug Reporter: Travis Crawford Attachments: zk_open_file_descriptor_count_members.gif, zk_open_file_descriptor_count_total.gif We've experienced a number of issues lately where ruok requests would take upwards of 10 seconds to return, and ZooKeeper instances were extremely sluggish. The sluggish instance requires a restart to make it responsive again. I believe the issue is connections are very imbalanced, leading to certain instances having many thousands of connections, while other instances are largely idle. A potential solution is periodically disconnecting/reconnecting to balance connections over time; this seems fine because sessions should not be affected, and therefore ephemaral nodes and watches should not be affected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-856) Connection imbalance leads to overloaded ZK instances
[ https://issues.apache.org/jira/browse/ZOOKEEPER-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-856: Fix Version/s: 3.4.0 Connection imbalance leads to overloaded ZK instances - Key: ZOOKEEPER-856 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-856 Project: Zookeeper Issue Type: Bug Reporter: Travis Crawford Fix For: 3.4.0 Attachments: zk_open_file_descriptor_count_members.gif, zk_open_file_descriptor_count_total.gif We've experienced a number of issues lately where ruok requests would take upwards of 10 seconds to return, and ZooKeeper instances were extremely sluggish. The sluggish instance requires a restart to make it responsive again. I believe the issue is connections are very imbalanced, leading to certain instances having many thousands of connections, while other instances are largely idle. A potential solution is periodically disconnecting/reconnecting to balance connections over time; this seems fine because sessions should not be affected, and therefore ephemaral nodes and watches should not be affected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-775: Attachment: ZOOKEEPER-775.patch this patch make the following changes to the sumitted patch. - removes ltmain.sh - removes client/src/main/cpp/m4/ax_jni_include_dir.m4 - remove author names A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-775: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Release Note: A pub sub system using BooKkeeper and ZooKeeper with C++ and Java client bindings. Resolution: Fixed I just committed this. Thanks Ivan, Erwin, Ben. It would be great if you guys can focus on improved documentation for this since it will be critical for adoption of the project. A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899930#action_12899930 ] Mahadev konar commented on ZOOKEEPER-775: - also, should I remove this file: {code} client/src/main/cpp/ltmain.sh {code} what about licensing on these files {code} client/src/main/cpp/m4 client/src/main/cpp/m4/ax_doxygen.m4 client/src/main/cpp/m4/ax_jni_include_dir.m4 {code} is it ok to distribute them via apache? A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-775: Comment: was deleted (was: Mahadev, we've been using the internal hedwig/apache svn branch for making changes, at least until things are committed to Apache's SVN. You can grab the latest changes there and incorporate them perhaps with your modifications. http://svn.corp.yahoo.com/view/yahoo/yrl/hedwig/apache/ Ivan is correct, the p12.pass file is just a dummy password file used for testing SSL connections between client and server. We need that otherwise the unit tests would fail.) A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897366#action_12897366 ] Mahadev konar commented on ZOOKEEPER-795: - ben, pinging again, can you provide a patch for 3.3.2? eventThread isn't shutdown after a connection session expired event coming Key: ZOOKEEPER-795 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: ubuntu 10.04 Reporter: mathieu barcikowski Assignee: Sergey Doroshenko Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, ZOOKEEPER-795.patch Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection session expired event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-772) zkpython segfaults when watcher from async get children is invoked.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-772: Status: Resolved (was: Patch Available) Resolution: Fixed Luckily the 3.4 patch applies to 3.3 branch. I just committed this. thanks henry! zkpython segfaults when watcher from async get children is invoked. --- Key: ZOOKEEPER-772 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-772 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Environment: ubuntu lucid (10.04) / zk trunk Reporter: Kapil Thangavelu Assignee: Henry Robinson Fix For: 3.3.2, 3.4.0 Attachments: asyncgetchildren.py, zkpython-testasyncgetchildren.diff, ZOOKEEPER-772.patch, ZOOKEEPER-772.patch When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-780) zkCli.sh generates a ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-780: Status: Open (was: Patch Available) zkCli.sh generates a ArrayIndexOutOfBoundsException - Key: ZOOKEEPER-780 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-780 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.1 Environment: Linux Ubuntu running in VMPlayer on top of Windows XP Reporter: Miguel Correia Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-780.patch, ZOOKEEPER-780.patch, ZOOKEEPER-780.patch I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records. I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data. A copy of the screen: [zk: localhost:2181(CONNECTED) 3] create /groups firstgroup Created /groups [zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-780) zkCli.sh generates a ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-780: Status: Patch Available (was: Open) retriggereing the patch build. zkCli.sh generates a ArrayIndexOutOfBoundsException - Key: ZOOKEEPER-780 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-780 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.1 Environment: Linux Ubuntu running in VMPlayer on top of Windows XP Reporter: Miguel Correia Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-780.patch, ZOOKEEPER-780.patch, ZOOKEEPER-780.patch I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records. I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data. A copy of the screen: [zk: localhost:2181(CONNECTED) 3] create /groups firstgroup Created /groups [zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897121#action_12897121 ] Mahadev konar commented on ZOOKEEPER-775: - also, you might want to get rid of this: {code} +% Developer's Guide +% Yang Zhang {code} in src/contrib/hedwig/doc/dev.txt Other than that this looks good. One question about protobufs, is it apache license? What all are we committing in zookeeper svn thats related to protobufs? A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897122#action_12897122 ] Mahadev konar commented on ZOOKEEPER-775: - sorry I am reviewing and making sure I am uploading my comments, it would have been better for me to write it down in a file and then uploaded at the same time :). - make sure that you list the files which need to be executable in svn. While committing the svn executable flag is usually lost. So, you will have to provide with a list of files that need to have special svn flags. A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897126#action_12897126 ] Mahadev konar commented on ZOOKEEPER-775: - one more comment :) - we are checking in some autogenerated files like ltmain.sh and others. Why do we need tocommit those? A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-800) zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar reassigned ZOOKEEPER-800: --- Assignee: Michi Mutsuzaki (was: Mahadev konar) zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE --- Key: ZOOKEEPER-800 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-800 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.3.1 Reporter: Michi Mutsuzaki Assignee: Michi Mutsuzaki Priority: Minor Fix For: 3.3.2, 3.4.0 This happened when I called zoo_add_auth() immediately after zookeeper_init(). It took me a while to figure out that authentication actually failed since zoo_add_auth() returned ZOK. It should return ZINVALIDSTATE instead. --Michi -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar reassigned ZOOKEEPER-795: --- Assignee: Sergey Doroshenko eventThread isn't shutdown after a connection session expired event coming Key: ZOOKEEPER-795 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: ubuntu 10.04 Reporter: mathieu barcikowski Assignee: Sergey Doroshenko Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, ZOOKEEPER-795.patch Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection session expired event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895577#action_12895577 ] Mahadev konar commented on ZOOKEEPER-795: - I have committed this to trunk. but the patch does not apply 3.3.2. ben can you provide a patch for 3.3.2? thanks eventThread isn't shutdown after a connection session expired event coming Key: ZOOKEEPER-795 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: ubuntu 10.04 Reporter: mathieu barcikowski Assignee: Sergey Doroshenko Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch, ZOOKEEPER-795.patch Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection session expired event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893816#action_12893816 ] Mahadev konar commented on ZOOKEEPER-790: - +1 the patch looks good. I will go ahead and commit it. Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.v2.patch, ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790-test.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch, ZOOKEEPER-790.v2.patch, ZOOKEEPER-790.v2.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-790: Status: Resolved (was: Patch Available) Resolution: Fixed I just committed this. thanks flavio and others! Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.v2.patch, ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790-test.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch, ZOOKEEPER-790.v2.patch, ZOOKEEPER-790.v2.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893219#action_12893219 ] Mahadev konar commented on ZOOKEEPER-790: - flavio, the patch looks good. Any reason you have this code in learnerhandler? {quote} @@ -475,6 +497,7 @@ } catch (IOException e) { LOG.warn(Ignoring unexpected exception during socket close, e); } +this.interrupt(); leader.removeLearnerHandler(this); } {quote} The interrupt? Why is this necessary? Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790-test.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch, ZOOKEEPER-790.v2.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893219#action_12893219 ] Mahadev konar edited comment on ZOOKEEPER-790 at 7/28/10 11:54 AM: --- flavio, the patch looks good. Any reason you have this code in learnerhandler? {code} @@ -475,6 +497,7 @@ } catch (IOException e) { LOG.warn(Ignoring unexpected exception during socket close, e); } +this.interrupt(); leader.removeLearnerHandler(this); } {code} The interrupt? Why is this necessary? was (Author: mahadev): flavio, the patch looks good. Any reason you have this code in learnerhandler? {quote} @@ -475,6 +497,7 @@ } catch (IOException e) { LOG.warn(Ignoring unexpected exception during socket close, e); } +this.interrupt(); leader.removeLearnerHandler(this); } {quote} The interrupt? Why is this necessary? Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790-test.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch, ZOOKEEPER-790.v2.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-792) zkpython memory leak
[ https://issues.apache.org/jira/browse/ZOOKEEPER-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892801#action_12892801 ] Mahadev konar commented on ZOOKEEPER-792: - henry, can you take a look at this? zkpython memory leak Key: ZOOKEEPER-792 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-792 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: vmware workstation - guest OS:Linux python:2.4.3 Reporter: Lei Zhang Assignee: Lei Zhang Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-792.patch We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less client deadlock on session expiration, which is a definite plus! Unfortunately we are seeing memory leak that requires our zk clients to be restarted every half-day. Valgrind result: ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in loss record 255 of 670 ==8804==at 0x4021C42: calloc (vg_replace_malloc.c:418) ==8804==by 0x5047B42: parse_acls (zookeeper.c:369) ==8804==by 0x5047EF6: pyzoo_create (zookeeper.c:1009) ==8804==by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) ==8804==by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-775) A large scale pub/sub system
[ https://issues.apache.org/jira/browse/ZOOKEEPER-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892803#action_12892803 ] Mahadev konar commented on ZOOKEEPER-775: - erwin/ben, Would it be possible for you to convert the documentation into forrest? Documentaiton as part of the release and accessible via website docs will really help adoption of hedwig. A large scale pub/sub system Key: ZOOKEEPER-775 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-775 Project: Zookeeper Issue Type: New Feature Components: contrib Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.4.0 Attachments: libs.zip, libs_2.zip, ZOOKEEPER-775.patch, ZOOKEEPER-775.patch, ZOOKEEPER-775_2.patch, ZOOKEEPER-775_3.patch we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-767) Submitting Demo/Recipe Shared / Exclusive Lock Code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892810#action_12892810 ] Mahadev konar commented on ZOOKEEPER-767: - sam, sorry for the delay in review. Any chance you'd be providing a c implementation of shared locks as well? Submitting Demo/Recipe Shared / Exclusive Lock Code --- Key: ZOOKEEPER-767 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-767 Project: Zookeeper Issue Type: Improvement Components: recipes Affects Versions: 3.3.0 Reporter: Sam Baskinger Assignee: Sam Baskinger Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-767.patch, ZOOKEEPER-767.patch, ZOOKEEPER-767.patch Networked Insights would like to share-back some code for shared/exclusive locking that we are using in our labs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-765) Add python example script
[ https://issues.apache.org/jira/browse/ZOOKEEPER-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-765: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Release Note: A skeleton script that shows how to setup znode watches and how to react to events using the Python client libraries. (was: Travis Crawford wrote a skeleton script that shows how to setup znode watches and how to react to events using the Python client libraries. ) Resolution: Fixed +1 the patch looks good. I just committed this. thanks travis and andrei! Add python example script - Key: ZOOKEEPER-765 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-765 Project: Zookeeper Issue Type: Improvement Components: contrib-bindings, documentation Reporter: Travis Crawford Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: zk.py, ZOOKEEPER-765.patch, ZOOKEEPER-765.patch When adding some zookeeper-based functionality to a python script I had to figure everything out without guidance, which while doable, would have been a lot easier with an example. I extracted a skeleton program structure out with hopes its useful to others (maybe add as an example in the source or wiki?). This script does an aget() and sets a watch, and hopefully illustrates what's going on, and where to plug in your application code that gets run when the znode changes. There are probably some bugs, which if we fix now and provide a well-reviewed example hopefully others will not run into the same mistakes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-821) Add ZooKeeper version information to zkpython
[ https://issues.apache.org/jira/browse/ZOOKEEPER-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-821: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Release Note: Add a version number to zkpython releases. Resolution: Fixed I just reviewed this and based on henry's comment I just committed this. thanks rich! Add ZooKeeper version information to zkpython - Key: ZOOKEEPER-821 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-821 Project: Zookeeper Issue Type: Improvement Components: contrib-bindings Affects Versions: 3.3.1 Reporter: Rich Schumacher Assignee: Rich Schumacher Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-821.patch Since installing and using ZooKeeper I've built and installed no less than four versions of the zkpython bindings. It would be really helpful if the module had a '__version__' attribute to easily tell which version is currently in use. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-814) monitoring scripts are missing apache license headers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892493#action_12892493 ] Mahadev konar commented on ZOOKEEPER-814: - +1 the patch looks good. Andrei, I think the monitoring tools and scripts are quite useful, though it would have been great to have forrest docs for all the plugins like ganglia/others, so that we can publish them as part of the release documentation!! Please consider adding forrest documentation (it should not be hard at all)!!! It would be really good to have admin documentation link to these forrest docs in case they want to use it! Having all the documentation as forrest docs and availalbe on the documentation website (as part of release) will highly increase the adoption of such a plugin! monitoring scripts are missing apache license headers - Key: ZOOKEEPER-814 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-814 Project: Zookeeper Issue Type: Bug Components: contrib Reporter: Patrick Hunt Assignee: Andrei Savu Priority: Blocker Fix For: 3.4.0 Attachments: ZOOKEEPER-814.patch Andrei, I just realized that src/contrib/monitoring files are missing apache license headers. Please add them (in particular any script files like python, see similar files in svn for examples - in some cases like README it's not strictly necessary.) You can run the RAT tool to verify (see build.xml or http://incubator.apache.org/rat/) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-814) monitoring scripts are missing apache license headers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-814: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I just committed this. thanks andrei! monitoring scripts are missing apache license headers - Key: ZOOKEEPER-814 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-814 Project: Zookeeper Issue Type: Bug Components: contrib Reporter: Patrick Hunt Assignee: Andrei Savu Priority: Blocker Fix For: 3.4.0 Attachments: ZOOKEEPER-814.patch Andrei, I just realized that src/contrib/monitoring files are missing apache license headers. Please add them (in particular any script files like python, see similar files in svn for examples - in some cases like README it's not strictly necessary.) You can run the RAT tool to verify (see build.xml or http://incubator.apache.org/rat/) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
[ https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892500#action_12892500 ] Mahadev konar commented on ZOOKEEPER-783: - +1 the patch looks good! committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-783.patch ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
[ https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-783: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I just committed this to trunk and 3.3 branch. Thanks henry! committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-783.patch ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-790) Last processed zxid set prematurely while establishing leadership
[ https://issues.apache.org/jira/browse/ZOOKEEPER-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892542#action_12892542 ] Mahadev konar commented on ZOOKEEPER-790: - sergey, your +1 does count :). Also, can you please upload the testcase that was failing? It would be good to include the testcase with the patch as pat mentioned above. Last processed zxid set prematurely while establishing leadership - Key: ZOOKEEPER-790 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-790 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.3.1 Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-3.3.patch, ZOOKEEPER-790-follower-request-NPE.log, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.patch, ZOOKEEPER-790.travis.log.bz2, ZOOKEEPER-790.v2.patch The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-795) eventThread isn't shutdown after a connection session expired event coming
[ https://issues.apache.org/jira/browse/ZOOKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887854#action_12887854 ] Mahadev konar commented on ZOOKEEPER-795: - ben, can you take a look at this? eventThread isn't shutdown after a connection session expired event coming Key: ZOOKEEPER-795 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-795 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Environment: ubuntu 10.04 Reporter: mathieu barcikowski Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ExpiredSessionThreadLeak.java, ZOOKEEPER-795.patch Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection session expired event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-783) committedLog in ZKDatabase is not properly synchronized
[ https://issues.apache.org/jira/browse/ZOOKEEPER-783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-783: Status: Patch Available (was: Open) i think we can do without a test on this one. marking it PA committedLog in ZKDatabase is not properly synchronized --- Key: ZOOKEEPER-783 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-783 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-783.patch ZKDatabase.getCommittedLog() returns a reference to the LinkedListProposal committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-800) zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE
[ https://issues.apache.org/jira/browse/ZOOKEEPER-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar reassigned ZOOKEEPER-800: --- Assignee: Mahadev konar zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE --- Key: ZOOKEEPER-800 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-800 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.3.1 Reporter: Michi Mutsuzaki Assignee: Mahadev konar Priority: Minor Fix For: 3.3.2, 3.4.0 This happened when I called zoo_add_auth() immediately after zookeeper_init(). It took me a while to figure out that authentication actually failed since zoo_add_auth() returned ZOK. It should return ZINVALIDSTATE instead. --Michi -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-806) Cluster management with Zookeeper - Norbert
[ https://issues.apache.org/jira/browse/ZOOKEEPER-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885969#action_12885969 ] Mahadev konar commented on ZOOKEEPER-806: - great to see this!!! John, Here is a link on how to contribute to ZooKeeper: http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute Hope this helps. Cluster management with Zookeeper - Norbert --- Key: ZOOKEEPER-806 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-806 Project: Zookeeper Issue Type: New Feature Reporter: John Wang Hello, we have built a cluster management layer on top of Zookeeper here at the SNA team at LinkedIn: http://sna-projects.com/norbert/ We were wondering ways for collaboration as this is a very useful application of zookeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-794) Callbacks are not invoked when the client is closed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886020#action_12886020 ] Mahadev konar commented on ZOOKEEPER-794: - alexis, Do you see the problem with c client library as well? Or do you just see the problem with the java client? Callbacks are not invoked when the client is closed --- Key: ZOOKEEPER-794 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-794 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.3.1 Reporter: Alexis Midon Assignee: Alexis Midon Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-794.patch.txt, ZOOKEEPER-794.txt I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a session expired exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885716#action_12885716 ] Mahadev konar commented on ZOOKEEPER-804: - this is the same issue as ZOOKEEPER-707. It should be very hard to reproduce. THis is surely a bug. I will upload a patch soon and we can get this in 3.3.2. c unit tests failing due to assertion cptr failed --- Key: ZOOKEEPER-804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.4.0 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) Reporter: Patrick Hunt Assignee: Mahadev konar Priority: Critical Fix For: 3.3.2, 3.4.0 I'm seeing this frequently: [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK [exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed. [exec] make: *** [run-check] Aborted [exec] Zookeeper_simpleSystem::testHangingClient Mahadev can you take a look? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-804) c unit tests failing due to assertion cptr failed
[ https://issues.apache.org/jira/browse/ZOOKEEPER-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885725#action_12885725 ] Mahadev konar commented on ZOOKEEPER-804: - great... I cant reproduce it on my machine. I rarely see this error. c unit tests failing due to assertion cptr failed --- Key: ZOOKEEPER-804 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-804 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.4.0 Environment: gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) Reporter: Patrick Hunt Assignee: Mahadev konar Priority: Critical Fix For: 3.3.2, 3.4.0 I'm seeing this frequently: [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK [exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed. [exec] make: *** [run-check] Aborted [exec] Zookeeper_simpleSystem::testHangingClient Mahadev can you take a look? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.