[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934978#action_12934978 ] Patrick Hunt commented on ZOOKEEPER-925: See this for more background on CMS: http://www.apache.org/dev/infra-site.html Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch, ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-880: -- Assignee: Vishal K QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Assignee: Vishal K Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-880: --- Fix Version/s: 3.4.0 3.3.3 Status: Open (was: Patch Available) We really should have a test for this case. Vishal can you add it? (the more the better) QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Assignee: Vishal K Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz, ZOOKEEPER-880.patch We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-925: --- Attachment: ZOOKEEPER-925.patch This updated patch has a db2confluence.py script which attempts to convert our docs from simpledocbook to confluence format. I converted the admin guide as an example. Try running mvn site then open the generated html file for admin (in target directory) Note: 1) doxia confluence doesn't support toc generation, so we'd need to maintain this for the time being until they implement 2) the note sections are not supported. We'd have to reformat this a bit ourselves to make it work (probably by specifying a css class that provides similar style) 3) xrefs in sdocbook will pull in the text of the referenced section, the conversion tool does not do this. We'd have to add this as part of the conversion. Take a look ant let me know. Anyone could run the conversion on the other docs to see how it works there (I only ran on admin). Use db2confluence.py (redirect stdout) to test on the other files. The python script (included in patch) is pretty simple to tweak if we notice other elements that are not being converted (correctly). Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch, ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-756) some cleanup and improvements for zooinspector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-756: --- Status: Open (was: Patch Available) Colin, looks like you are having some weird line ending problem... patch is not applying. some cleanup and improvements for zooinspector -- Key: ZOOKEEPER-756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-756 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.0 Reporter: Thomas Koch Assignee: Colin Goodheart-Smithe Fix For: 3.4.0 Attachments: Zooinspector-patch.patch, zooInspectorChanges.patch, zooInspectorChanges.patch, ZOOKEEPER-756.patch Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as Veiwers. Please do a case insensitive search for veiw to correct these. Even the config file defaultNodeVeiwers.cfg is missspelled like this. This has the potential to confuse the hell out of people when debugging something! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds
[ https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933891#action_12933891 ] Patrick Hunt commented on ZOOKEEPER-880: Flavio (and others) we should update the docs to include details on which ports can/should be monitored, and which ports should NOT be monitored (or if monitoring is supported any conditions). Can we update the docs as part of any patch/fix? Thanks. QuorumCnxManager$SendWorker grows without bounds Key: ZOOKEEPER-880 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.2 Reporter: Jean-Daniel Cryans Priority: Critical Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-936) zkpython is leaking ACL_vector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-936: --- Priority: Critical (was: Major) Fix Version/s: 3.4.0 3.3.3 Assignee: Gustavo Niemeyer zkpython is leaking ACL_vector -- Key: ZOOKEEPER-936 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-936 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Reporter: Gustavo Niemeyer Assignee: Gustavo Niemeyer Priority: Critical Fix For: 3.3.3, 3.4.0 It looks like there are no calls to deallocate_ACL_vector() within zookeeper.c in the zkpython binding, which means that (at least) the result of zoo_get_acl() must be leaking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-935) Concurrent primitives library - shared lock
[ https://issues.apache.org/jira/browse/ZOOKEEPER-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-935: --- Fix Version/s: 3.4.0 Assignee: ChiaHung Lin Thanks for the patch! Slating for 3.4.0. Concurrent primitives library - shared lock --- Key: ZOOKEEPER-935 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-935 Project: Zookeeper Issue Type: Improvement Components: recipes Environment: Debian squeeze JDK 1.6.x zookeeper trunk Reporter: ChiaHung Lin Assignee: ChiaHung Lin Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-935.patch I create this jira to add sharedock function. The function follows recipes at http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#Shared+Locks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933018#action_12933018 ] Patrick Hunt commented on ZOOKEEPER-925: @Alex looks good to me. We're new with mvn based site gen, what's the implications on our side of adding a site.vm file? Is that not something we can specify with site.xml? Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933024#action_12933024 ] Patrick Hunt commented on ZOOKEEPER-900: I see some great information about how the code/algos operate being detailed in these jiras. I highly encourage you guys to document this stuff in either the code or in a separate document available on the wiki/forrest (now mvn site, whatever). It's critical that we provide more details like this to our devs. See ZOOKEEPER-918 as a great example of what I'm talking about. (although adding more comments to the code is fine too). Basically, if you find yourself describing something in a jira that's not documented already, consider documenting it. Thanks. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-902) Fix findbug issue in trunk Malicious code vulnerability
[ https://issues.apache.org/jira/browse/ZOOKEEPER-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933020#action_12933020 ] Patrick Hunt commented on ZOOKEEPER-902: Sounds good. Let's clear out 900 then we can adjust the OK setting back to 0 (as part of this jira) once 900 is committed. Fix findbug issue in trunk Malicious code vulnerability - Key: ZOOKEEPER-902 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-902 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.4.0 Reporter: Patrick Hunt Priority: Minor Fix For: 3.4.0 https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/970/artifact/trunk/findbugs/zookeeper-findbugs-report.html#Warnings_MALICIOUS_CODE Malicious code vulnerability Warnings Code Warning MSorg.apache.zookeeper.server.quorum.LeaderElection.epochGen isn't final but should be -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933074#action_12933074 ] Patrick Hunt commented on ZOOKEEPER-925: TANSTAAFL ;-) You'd have to modify the db2rst a bit to get it to output confluence tags, iirc that was pretty easy to do (plus it's in python). Or find a rst to confluence converter (I looked a bit last night but didn't find, would doxia converter work?) Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932590#action_12932590 ] Patrick Hunt commented on ZOOKEEPER-925: At this point to move fwd we need to work on 2 main areas: 1) site gen 2) doc conversion item 1) is looking pretty good, but some work yet to be done, icons and look/feel mainly item 2) just requires us to decide which if any format we want to standardize on and then try moving some docs to that format. I would highly suggest that our standard/preferred format be confluence format -- we can move our wiki to there at some point, which will match up nicely. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932607#action_12932607 ] Patrick Hunt commented on ZOOKEEPER-896: The security docs (both client side and server (plugin arch is totally undoc'd)) is sorely in need of improvement. If you are in the area and would like to help out adding docs would be huge. Thanks. Improve C client to support dynamic authentication schemes -- Key: ZOOKEEPER-896 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Botond Hejj Assignee: Botond Hejj Fix For: 3.4.0 Attachments: ZOOKEEPER-896.patch When we started exploring zookeeper for our requirements we found the authentication mechanism is not flexible enough. We want to use kerberos for authentication but using the current API we ran into a few problems. The idea is that we get a kerberos token on the client side and than send that token to the server with a kerberos scheme. A server side authentication plugin can use that token to authenticate the client and also use the token for authorization. We ran into two problems with this approach: 1. A different kerberos token is needed for each different server that client can connect to since kerberos uses mutual authentication. That means when the client acquires this kerberos token it has to know which server it connects to and generate the token according to that. The client currently can't generate a token for a specific server. The token stored in the auth_info is used for all the servers. 2. The kerberos token might have an expiry time so if the client loses the connection to the server and than it tries to reconnect it should acquire a new token. That is not possible currently since the token is stored in auth_info and reused for every connection. The problem can be solved if we allow the client to register a callback for authentication instead a static token. This can be a callback with an argument which passes the current host string. The zookeeper client code could call this callback before it sends the authentication info to the server to get a fresh server specific token. This would solve our problem with the kerberos authentication and also could be used for other more dynamic authentication schemes. The solution could be generalization also for the java client as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932708#action_12932708 ] Patrick Hunt commented on ZOOKEEPER-366: FYI, this came up again today on hbase list: 14:59 _hp_ man this system time update on a bunch of machines causing zookeeper session timeouts causing hr's to die is really taking its toll, count on a table now hangs, i disabled and enabled the table, tried count again, and it hangs at the same place still. Arg. Ben any progress on this? Should we try to get it into 3.3.3? Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Reporter: Benjamin Reed Assignee: Benjamin Reed Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-366) Session timeout detection can go wrong if the leader system time changes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-366: --- Component/s: server quorum Fix Version/s: 3.4.0 3.3.3 Session timeout detection can go wrong if the leader system time changes Key: ZOOKEEPER-366 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-366 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Benjamin Reed Assignee: Benjamin Reed Fix For: 3.3.3, 3.4.0 Attachments: ZOOKEEPER-366.patch the leader tracks session expirations by calculating when a session will timeout and then periodically checking to see what needs to be timed out based on the current time. this works great as long as the leaders clock progresses at a steady pace. the problem comes when there are big (session size) changes in clock, by ntp for example. if time gets adjusted forward, all the sessions could timeout immediately. if time goes backward sessions that should timeout may take a lot longer to actually expire. this is really just a leader issue. the easiest way to deal with this is to have the leader relinquish leadership if it detects a big jump forward in time. when a new leader gets elected, it will recalculate timeouts of active sessions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-756) some cleanup and improvements for zooinspector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-756: --- Status: Open (was: Patch Available) some cleanup and improvements for zooinspector -- Key: ZOOKEEPER-756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-756 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.0 Reporter: Thomas Koch Assignee: Colin Goodheart-Smithe Fix For: 3.4.0 Attachments: zooInspectorChanges.patch, ZOOKEEPER-756.patch Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as Veiwers. Please do a case insensitive search for veiw to correct these. Even the config file defaultNodeVeiwers.cfg is missspelled like this. This has the potential to confuse the hell out of people when debugging something! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-756) some cleanup and improvements for zooinspector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-756: --- Status: Patch Available (was: Open) some cleanup and improvements for zooinspector -- Key: ZOOKEEPER-756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-756 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.0 Reporter: Thomas Koch Assignee: Colin Goodheart-Smithe Fix For: 3.4.0 Attachments: zooInspectorChanges.patch, ZOOKEEPER-756.patch Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as Veiwers. Please do a case insensitive search for veiw to correct these. Even the config file defaultNodeVeiwers.cfg is missspelled like this. This has the potential to confuse the hell out of people when debugging something! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-756) some cleanup and improvements for zooinspector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-756: --- Attachment: ZOOKEEPER-756.patch Looks like line endings are the problem. The current source in svn has some lines with ^M line endings and this seems to be messing up the patch application (at least under unix). I tried it myself and had similar problems (I'm on unix). I then used dos2unix to convert all the zooinspector files to unix line endings, then reapplied Colin's patch and it patched fine. I created a new patch file based on this, which I'm now attaching. Please review this updated patch and make sure I didn't miss anything some cleanup and improvements for zooinspector -- Key: ZOOKEEPER-756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-756 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.0 Reporter: Thomas Koch Assignee: Colin Goodheart-Smithe Fix For: 3.4.0 Attachments: zooInspectorChanges.patch, ZOOKEEPER-756.patch Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as Veiwers. Please do a case insensitive search for veiw to correct these. Even the config file defaultNodeVeiwers.cfg is missspelled like this. This has the potential to confuse the hell out of people when debugging something! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-756) some cleanup and improvements for zooinspector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-756: --- Status: Open (was: Patch Available) cancelling patch - please address the @author issues (remove them), the javadoc warning seems to be some issue with the link being accessed (not with the change itself). some cleanup and improvements for zooinspector -- Key: ZOOKEEPER-756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-756 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.0 Reporter: Thomas Koch Assignee: Colin Goodheart-Smithe Fix For: 3.4.0 Attachments: zooInspectorChanges.patch, ZOOKEEPER-756.patch Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as Veiwers. Please do a case insensitive search for veiw to correct these. Even the config file defaultNodeVeiwers.cfg is missspelled like this. This has the potential to confuse the hell out of people when debugging something! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932118#action_12932118 ] Patrick Hunt commented on ZOOKEEPER-900: I'd appreciate if you could fix the findbugs, that would be great. See also ZOOKEEPER-902 -- as part of the fix set the findbugs acceptable back to 0. Thanks! FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931444#action_12931444 ] Patrick Hunt commented on ZOOKEEPER-900: Flavio, I'd be worried that different tcp stacks might (inter)operate differently in practice vs theory. In general it's pretty tough to get this right - look at all the problems we've been having with netcat behavior https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truequery=netcatsummary=truedescription=truebody=truepid=12310801 Ubuntu recently moved from traditional to the newish bsd flavor (supports ipv6 natively) of nc and we are back to having issues after having made significant changes in 3.3 to fix this (incl a number of tests that simulated the nc behavior as closely as we could understand it). FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-860) Add alternative search-provider to ZK site
[ https://issues.apache.org/jira/browse/ZOOKEEPER-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931462#action_12931462 ] Patrick Hunt commented on ZOOKEEPER-860: Hi Alex, Otis. Take a look at ZOOKEEPER-925. I think this is a good time (new site gen and new site once/if we get approved as TLP) to introduce this change. Perhaps you could update the sitegen to include this? It would give ppl a change to try it out. Regards. Add alternative search-provider to ZK site -- Key: ZOOKEEPER-860 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-860 Project: Zookeeper Issue Type: Improvement Components: documentation Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-860.patch Use search-hadoop.com service to make available search in ZK sources, MLs, wiki, etc. This was initially proposed on user mailing list (http://search-hadoop.com/m/sTZ4Y1BVKWg1). The search service was already added in site's skin (common for all Hadoop related projects) before (as a part of [AVRO-626|https://issues.apache.org/jira/browse/AVRO-626]) so this issue is about enabling it for ZK. The ultimate goal is to use it at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931466#action_12931466 ] Patrick Hunt commented on ZOOKEEPER-900: I don't know for this specific case, but the corners I've looked at (tearing down a connection) there have been issues. Perhaps they are issues on our side, I'm not certain, but I do know that we fail with this version of nc (default in ubuntu maverick) even after significant work was done to address the original problem: OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2) Let's assume what you say is correct -- we'd want to test this carefully to assure ourselves. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931487#action_12931487 ] Patrick Hunt commented on ZOOKEEPER-900: please try to keep the reformatting changes to a minimum unless it's code directly being worked on. otw it makes it harder to review (svn -x -w diff does help, but still) and blame detail is lost. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch1 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931497#action_12931497 ] Patrick Hunt commented on ZOOKEEPER-900: Looking at the patch. Quite a bit changed, hard to tell which is important and which not. In these situations I've used the -w diff trick to get just the important changes, then applied that patch to virgin code, opened the file in eclipse and fixed the (relatively) smaller set of formatting issues. Also, the patch includes log4j.properties change, you don't want to include that I'm thinking. FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch1 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
[ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931554#action_12931554 ] Patrick Hunt commented on ZOOKEEPER-900: fyi, if a patch is ready for review/commit then click the submit patch link -- will trigger the workflow. Also if you use the same patch name (ZOOKEEPER-###.patch) and re-attach with the same name jira will handle this correctly, more detail here: http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute thanks! FLE implementation should be improved to use non-blocking sockets - Key: ZOOKEEPER-900 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2 From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective ! peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-929) hudson qabot incorrectly reporting issues as number 909 when the patch from 908 is the one being tested
hudson qabot incorrectly reporting issues as number 909 when the patch from 908 is the one being tested --- Key: ZOOKEEPER-929 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-929 Project: Zookeeper Issue Type: Bug Components: build Reporter: Patrick Hunt Assignee: Nigel Daley Hi Nigel can you take a look at this? Following you'll see the email I got, notice that the patch is patch 908, however if you look at the hudson page it's linked to the change is documented as 909 patch file applied https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25/changes I looked at both jiras ZOOKEEPER-908 and ZOOKEEPER-909 both of these look good (the right names on patches) and qabot actually updated 908 with the comment (failure). However the change is listed as 909 which is wrong. [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12459361/ZOOKEEPER-908.patch [exec] against trunk revision 1033770. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//testReport/ [exec] Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//console [exec] [exec] This message is automatically generated. [exec] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-908) Remove code duplication and inconsistent naming in ClientCnxn.Packet creation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-908: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 change looks good to me. Thanks Thomas! Committed to trunk. Remove code duplication and inconsistent naming in ClientCnxn.Packet creation - Key: ZOOKEEPER-908 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-908 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Thomas Koch Assignee: Thomas Koch Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-908.patch rename record - request (since their is a counterpart record named response) rename header - requestHeader (to distinguish from responseHeader) remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-780) zkCli.sh generates a ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931092#action_12931092 ] Patrick Hunt commented on ZOOKEEPER-780: Agreed (no prev tests) but really this highlights that there should be. Thanks! zkCli.sh generates a ArrayIndexOutOfBoundsException - Key: ZOOKEEPER-780 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-780 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.1 Environment: Linux Ubuntu running in VMPlayer on top of Windows XP Reporter: Miguel Correia Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-780.patch, ZOOKEEPER-780.patch, ZOOKEEPER-780.patch I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records. I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data. A copy of the screen: [zk: localhost:2181(CONNECTED) 3] create /groups firstgroup Created /groups [zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930548#action_12930548 ] Patrick Hunt commented on ZOOKEEPER-909: Hi Thomas. Still shaky legs on getting the patch queue up and working again. Shouldn't keep us from getting this committed though. re javadoc, this is not an issue for the other patches afaict, any idea why it's just showing up for this patch? There are two sets of tests, java and the c client binding. Unfortunately hudson currently does not highlight c failures on the summary page, you need to checkout the console (usually raw) in the case where the tests fail (but not java test). Looking at console I see: [exec] [exec] ZooKeeper server process failed ZooKeeper server NOT startedRunning I've notified Nigel about this to see is he has insight (saw it on a couple other jiras). So far he hasn't had a chance to look into it. Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930553#action_12930553 ] Patrick Hunt commented on ZOOKEEPER-909: better, but why is javadoc failing for this but not the other patches? Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930648#action_12930648 ] Patrick Hunt commented on ZOOKEEPER-896: Hi Bontond, if this is ready to go (you think it's ready for review/commit) please click the submit patch link on the left hand side of this page. That will trigger the necessary workflow. thanks! Improve C client to support dynamic authentication schemes -- Key: ZOOKEEPER-896 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Botond Hejj Assignee: Botond Hejj Fix For: 3.4.0 Attachments: ZOOKEEPER-896.patch When we started exploring zookeeper for our requirements we found the authentication mechanism is not flexible enough. We want to use kerberos for authentication but using the current API we ran into a few problems. The idea is that we get a kerberos token on the client side and than send that token to the server with a kerberos scheme. A server side authentication plugin can use that token to authenticate the client and also use the token for authorization. We ran into two problems with this approach: 1. A different kerberos token is needed for each different server that client can connect to since kerberos uses mutual authentication. That means when the client acquires this kerberos token it has to know which server it connects to and generate the token according to that. The client currently can't generate a token for a specific server. The token stored in the auth_info is used for all the servers. 2. The kerberos token might have an expiry time so if the client loses the connection to the server and than it tries to reconnect it should acquire a new token. That is not possible currently since the token is stored in auth_info and reused for every connection. The problem can be solved if we allow the client to register a callback for authentication instead a static token. This can be a callback with an argument which passes the current host string. The zookeeper client code could call this callback before it sends the authentication info to the server to get a fresh server specific token. This would solve our problem with the kerberos authentication and also could be used for other more dynamic authentication schemes. The solution could be generalization also for the java client as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930654#action_12930654 ] Patrick Hunt commented on ZOOKEEPER-905: Hi Nicholas, thanks! You've currently got the workflow in inprogress mode, iirc this happens when you resume progress or something like that (we typically don't use that part of the workflow, if the issue is assigned to you the assumption is that you are working on it unless we hear otw). You'll need to take the jira out of inprogress mode and then select submit patch for this to go to the qabot and then get reviewed/comitted by a committer. One other FYI, this jira is assigned to be fixed in 3.4.0 (current trunk, ie the next full trunk release). Typically you'd want to create the patch against svn trunk. Also, the patch queue on hudson (qa bot) will only test patches against trunk. Not a big deal (your patch may apply against trunk even if created from 3.3.1) but I just wanted to give you that headsup. Thanks again. Regards. enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Reporter: Nicholas Harteau Assignee: Nicholas Harteau Priority: Minor Fix For: 3.4.0 Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-784) server-side functionality for read-only mode
[ https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930692#action_12930692 ] Patrick Hunt commented on ZOOKEEPER-784: No worries, I'd like to get this in given you've done a bunch of work on it, qabot just flagged it given it's recently working again. thanks. server-side functionality for read-only mode Key: ZOOKEEPER-784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create ReadOnlyZooKeeperServer which comes into play when peer is partitioned. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-928: --- Component/s: server quorum Priority: Critical (was: Major) Fix Version/s: 3.3.3 Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical Fix For: 3.3.3, 3.4.0 In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-909: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed to trunk. Thanks for following through on this Thomas! Look forward to seeing the rest of it. Regards. Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930905#action_12930905 ] Patrick Hunt commented on ZOOKEEPER-928: according to this it's not a bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4614802 specifically: The read methods in SocketChannel (and DatagramChannel) do not support timeouts. If you need the timeout functionality then use the read methods of the associated Socket (or DatagramSocket) object. notice this was asked/answered a while ago though, however I suspect it's still true. Follower should stop following and start FLE if it does not receive pings from the leader - Key: ZOOKEEPER-928 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.3.2 Reporter: Vishal K Priority: Critical In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930188#action_12930188 ] Patrick Hunt commented on ZOOKEEPER-925: There are a few issues: A while back I looked at replacing forrest with python's sphinx, the conversion itself was pretty straightforward given there was a script that did most of the work. I don't see a script for forrest-confluence, perhaps we could re-purpose the other one I used, or just do a manual search/replace of the tags. It will take some work to convert the formats, but not huge given the size of our forrest based docs. Another issue was that we lost the hadoop lookfeel. This was really the insurmountable problem when I looked at it before. However now that we are moving out of hadoop into our own tlp space I don't see that as an issue. Probably we want our own look/feel anyway. Going to maven based site gen we just need to create the toplevel pom.xml file and a toplevel src/site directory that contains the content and the descriptor (how to generate the site, what links, etc... that's all configurable). We can then tell people to use both ant and mvn for the time being. mvn would initially just be mvn site (site/doc generation) and ant for all the things we do today. I can create a patch that does maven site generation if there's sufficient interest (I don't want to waste my time though if everyone's not on board). What do you think? Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-926) Fork Hadoop common's test-patch.sh and modify for Zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930189#action_12930189 ] Patrick Hunt commented on ZOOKEEPER-926: +1 Thanks Nigel Giri! Fork Hadoop common's test-patch.sh and modify for Zookeeper --- Key: ZOOKEEPER-926 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-926 Project: Zookeeper Issue Type: Improvement Components: build Reporter: Nigel Daley Fix For: 3.4.0 Attachments: ZOOKEEPER-926.patch Zookeeper currently uses the test-patch.sh script from the Hadoop nightly dir. This is now out of date. I propose we just copy the updated one in Hadoop common and then modify for ZK. This will also help as ZK moves out of Hadoop to it's own TLP. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-926) Fork Hadoop common's test-patch.sh and modify for Zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-926: -- Assignee: Nigel Daley Fork Hadoop common's test-patch.sh and modify for Zookeeper --- Key: ZOOKEEPER-926 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-926 Project: Zookeeper Issue Type: Improvement Components: build Reporter: Nigel Daley Assignee: Nigel Daley Fix For: 3.4.0 Attachments: ZOOKEEPER-926.patch Zookeeper currently uses the test-patch.sh script from the Hadoop nightly dir. This is now out of date. I propose we just copy the updated one in Hadoop common and then modify for ZK. This will also help as ZK moves out of Hadoop to it's own TLP. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-922: --- Status: Patch Available (was: Open) enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930191#action_12930191 ] Patrick Hunt commented on ZOOKEEPER-905: Hi Nicholas, actually the issue is that you need to use svn to create the diff (hudson doesn't like the fact that it doesn't know what .orig file is) See this page for instructions (basically checkout svn, make change, do svn diff) http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Reporter: Nicholas Harteau Assignee: Nicholas Harteau Priority: Minor Fix For: 3.4.0 Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930193#action_12930193 ] Patrick Hunt commented on ZOOKEEPER-922: Hi Camille, the patch has to be created from the top most directory (trunk) for hudson to apply the patch correctly, please see: http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute (basically checkout trunk, make changes, do svn diff at the toplevel) Thanks! enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn
[ https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930195#action_12930195 ] Patrick Hunt commented on ZOOKEEPER-909: Hi Thomas, thanks! One more request, can you make this a single diff? Otw the hudson patch queue doesn't work properly. http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute (you probably need to svn add the new file for svn diff to pick it up) Then cancel/submit the jira again to trigger the workflow. Thanks! Extract NIO specific code from ClientCnxn - Key: ZOOKEEPER-909 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909 Project: Zookeeper Issue Type: Sub-task Components: java client Reporter: Thomas Koch Assignee: Thomas Koch Fix For: 3.4.0 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-913) Version parser fails to parse 3.3.2-dev from build.xml.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930216#action_12930216 ] Patrick Hunt commented on ZOOKEEPER-913: I propose we move to support the same format as maven supports: http://www.sonatype.com/books/mvnref-book/reference/pom-relationships-sect-pom-syntax.html major version.minor version.incremental version-qualifier this is close to what we do, but allows for any qualifier after the x.y.z- Version parser fails to parse 3.3.2-dev from build.xml. - Key: ZOOKEEPER-913 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-913 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Anthony Urso Priority: Critical Fix For: 3.4.0 Attachments: zk-build.patch, zk-version.patch Cannot build 3.3.1 from release tarball do to VerGen parser inability to parse 3.3.2-dev. version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread main java.lang.NumberFormatException: For input string: 2-dev [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [java] at java.lang.Integer.parseInt(Integer.java:481) [java] at java.lang.Integer.parseInt(Integer.java:514) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-902) Fix findbug issue in trunk Malicious code vulnerability
[ https://issues.apache.org/jira/browse/ZOOKEEPER-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930222#action_12930222 ] Patrick Hunt commented on ZOOKEEPER-902: The patch queue now has a setting: (10:28:53 AM) nigelcdn: There's a new file in src/java/test/bin/test-patch.properties in which is defined the acceptable number of warnings (10:29:03 AM) nigelcdn: use it very judiciously ;-) after this issue is fixed we should adjust that file back to 0. Fix findbug issue in trunk Malicious code vulnerability - Key: ZOOKEEPER-902 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-902 Project: Zookeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.4.0 Reporter: Patrick Hunt Priority: Minor Fix For: 3.4.0 https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/970/artifact/trunk/findbugs/zookeeper-findbugs-report.html#Warnings_MALICIOUS_CODE Malicious code vulnerability Warnings Code Warning MSorg.apache.zookeeper.server.quorum.LeaderElection.epochGen isn't final but should be -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930224#action_12930224 ] Patrick Hunt commented on ZOOKEEPER-925: We prolly wouldn't generate pdfs for the web site, no one seems to do that anymore (although it's possible if someone would want to do it explicitly for some reason) We check in the source, that's a given. We check in the generated site/docs for a reason as well. In forrest timeframe it was mainly due to the fact that using forrest is a pita. ;-) In maven that's less of a concern. For whirr we currently don't checkin the generated, but we are thinking of doing so to lower the bar for new users. Doesn't really matter much to me, we could try not committing generated first, then see if it's an issue. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930226#action_12930226 ] Patrick Hunt commented on ZOOKEEPER-925: btw, given you just checkout the repo and literally type mvn site (maven handles all the dependency d/l) there's basically no reason to commit the generated docs. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-850) Switch from log4j to slf4j
[ https://issues.apache.org/jira/browse/ZOOKEEPER-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930229#action_12930229 ] Patrick Hunt commented on ZOOKEEPER-850: fyi hbase's jira for similar change: HBASE-2608 (looks like this patch is proposing similar to 2 in hbase jira?) Switch from log4j to slf4j -- Key: ZOOKEEPER-850 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-850 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Olaf Krische Assignee: Olaf Krische Fix For: 3.4.0 Attachments: ZOOKEEPER-3.3.1-log4j-slf4j-20101031.patch.bz2, ZOOKEEPER-3.4.0-log4j-slf4j-20101102.patch.bz2 Hello, i would like to see slf4j integrated into the zookeeper instead of relying explicitly on log4j. slf4j is an abstract logging framework. There are adapters from slf4j to many logger implementations, one of them is log4j. The decision which log engine to use i dont like to make so early. This would help me to embed zookeeper in my own applications (which use a different logger implemenation, but slf4j is the basis) What do you think? (as i can see, those slf4j request flood all other projects on apache as well :-) Maybe for 3.4 or 4.0? I can offer a patchset, i have experience in such an migration already. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-850) Switch from log4j to slf4j
[ https://issues.apache.org/jira/browse/ZOOKEEPER-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-850: --- Attachment: ZOOKEEPER-850.patch Attaching Olaf's patch as raw patch file so that hudson can do it's magic. Switch from log4j to slf4j -- Key: ZOOKEEPER-850 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-850 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Olaf Krische Assignee: Olaf Krische Fix For: 3.4.0 Attachments: ZOOKEEPER-3.3.1-log4j-slf4j-20101031.patch.bz2, ZOOKEEPER-3.4.0-log4j-slf4j-20101102.patch.bz2, ZOOKEEPER-850.patch Hello, i would like to see slf4j integrated into the zookeeper instead of relying explicitly on log4j. slf4j is an abstract logging framework. There are adapters from slf4j to many logger implementations, one of them is log4j. The decision which log engine to use i dont like to make so early. This would help me to embed zookeeper in my own applications (which use a different logger implemenation, but slf4j is the basis) What do you think? (as i can see, those slf4j request flood all other projects on apache as well :-) Maybe for 3.4 or 4.0? I can offer a patchset, i have experience in such an migration already. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-850) Switch from log4j to slf4j
[ https://issues.apache.org/jira/browse/ZOOKEEPER-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930232#action_12930232 ] Patrick Hunt commented on ZOOKEEPER-850: Olaf, re 5 could you add similar comments to this JIRA in the release notes section? We'll need that when doing the release itself. Switch from log4j to slf4j -- Key: ZOOKEEPER-850 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-850 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Olaf Krische Assignee: Olaf Krische Fix For: 3.4.0 Attachments: ZOOKEEPER-3.3.1-log4j-slf4j-20101031.patch.bz2, ZOOKEEPER-3.4.0-log4j-slf4j-20101102.patch.bz2, ZOOKEEPER-850.patch Hello, i would like to see slf4j integrated into the zookeeper instead of relying explicitly on log4j. slf4j is an abstract logging framework. There are adapters from slf4j to many logger implementations, one of them is log4j. The decision which log engine to use i dont like to make so early. This would help me to embed zookeeper in my own applications (which use a different logger implemenation, but slf4j is the basis) What do you think? (as i can see, those slf4j request flood all other projects on apache as well :-) Maybe for 3.4 or 4.0? I can offer a patchset, i have experience in such an migration already. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-925: --- Attachment: ZOOKEEPER-925.patch apply this patch, then in the toplevel directory type mvn site:site, then open target/site/index.html in your browser. Notice the index.confluence src page, try editing that (confluence wiki markup http://maven.apache.org/doxia/modules/index.html#Confluence) and regenerating/viewing the updated site. site.xml controls the layout and which links are put into the generated site. Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-925: -- Assignee: Patrick Hunt Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt Assignee: Patrick Hunt Attachments: ZOOKEEPER-925.patch See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-913) Version parser fails to parse 3.3.2-dev from build.xml.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-913: -- Assignee: Patrick Hunt (was: Anthony Urso) Version parser fails to parse 3.3.2-dev from build.xml. - Key: ZOOKEEPER-913 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-913 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Patrick Hunt Priority: Critical Fix For: 3.4.0 Attachments: zk-build.patch, zk-version.patch Cannot build 3.3.1 from release tarball do to VerGen parser inability to parse 3.3.2-dev. version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread main java.lang.NumberFormatException: For input string: 2-dev [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [java] at java.lang.Integer.parseInt(Integer.java:481) [java] at java.lang.Integer.parseInt(Integer.java:514) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-913) Version parser fails to parse 3.3.2-dev from build.xml.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-913: --- Fix Version/s: 3.3.3 Version parser fails to parse 3.3.2-dev from build.xml. - Key: ZOOKEEPER-913 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-913 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Patrick Hunt Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zk-build.patch, zk-version.patch Cannot build 3.3.1 from release tarball do to VerGen parser inability to parse 3.3.2-dev. version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread main java.lang.NumberFormatException: For input string: 2-dev [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [java] at java.lang.Integer.parseInt(Integer.java:481) [java] at java.lang.Integer.parseInt(Integer.java:514) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-913) Version parser fails to parse 3.3.2-dev from build.xml.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-913: --- Status: Patch Available (was: Open) Version parser fails to parse 3.3.2-dev from build.xml. - Key: ZOOKEEPER-913 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-913 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Patrick Hunt Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zk-build.patch, zk-version.patch, ZOOKEEPER-913.patch Cannot build 3.3.1 from release tarball do to VerGen parser inability to parse 3.3.2-dev. version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread main java.lang.NumberFormatException: For input string: 2-dev [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [java] at java.lang.Integer.parseInt(Integer.java:481) [java] at java.lang.Integer.parseInt(Integer.java:514) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-913) Version parser fails to parse 3.3.2-dev from build.xml.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-913: --- Attachment: ZOOKEEPER-913.patch this most recent patch implements support for a version format similar to what maven does. Added tests, also verified on the command line (build) using -Dversion option. Seems to work ok. Version parser fails to parse 3.3.2-dev from build.xml. - Key: ZOOKEEPER-913 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-913 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Patrick Hunt Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zk-build.patch, zk-version.patch, ZOOKEEPER-913.patch Cannot build 3.3.1 from release tarball do to VerGen parser inability to parse 3.3.2-dev. version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread main java.lang.NumberFormatException: For input string: 2-dev [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [java] at java.lang.Integer.parseInt(Integer.java:481) [java] at java.lang.Integer.parseInt(Integer.java:514) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-891) Allow non-numeric version strings
[ https://issues.apache.org/jira/browse/ZOOKEEPER-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt resolved ZOOKEEPER-891. Resolution: Duplicate This is a dup of ZOOKEEPER-913 See the patch there (handles this case). Allow non-numeric version strings - Key: ZOOKEEPER-891 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-891 Project: Zookeeper Issue Type: Improvement Components: build Reporter: Eli Collins Priority: Minor Fix For: 3.4.0 Non-numeric version strings (eg -dev) or -are not currently accepted, you either get an error (Invalid version number format, must be x.y.z) or if you pass x.y.z-dev or x.y.z+1 you'll get a NumberFormatException. Would be useful to allow non-numeric versions. {noformat} version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread main java.lang.NumberFormatException: For input string: 3-dev [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) [java] at java.lang.Integer.parseInt(Integer.java:458) [java] at java.lang.Integer.parseInt(Integer.java:499) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-920) L7 (application layer) ping support
[ https://issues.apache.org/jira/browse/ZOOKEEPER-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930399#action_12930399 ] Patrick Hunt commented on ZOOKEEPER-920: In general unit tests for the c binding is an area we could use more help with (more tests). If you're interested. :-) L7 (application layer) ping support --- Key: ZOOKEEPER-920 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-920 Project: Zookeeper Issue Type: New Feature Components: c client Affects Versions: 3.3.1 Reporter: Chang Song Assignee: Chang Song Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-920.patch Zookeeper is used in applications where fault tolerance is important. Its client i/o thread send/recv heartbeats to/fro Zookeeper ensemble to stay connected. However healthy heartbeat does not always means that the application that uses Zookeeper client is in good health, it only means that ZK client thread is in good health. This I needed something that can tagged onto Zookeeper ping that represents L7 (application) health as well. I have modified C client source to support this in minimal way. I am new to Zookeeper, so please code review this code. I am actually using this code in our in-house solution. https://github.com/tru64ufs/zookeeper/commit/2196d6d5114a2fd2c0a3bc9a55f4494d47d2aece Thank you very much. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930407#action_12930407 ] Patrick Hunt commented on ZOOKEEPER-922: @camille NP, although it makes it easier for us (reviewers) if all the patches are consistent. For future reference then. Thanks. ps. you might get more insightful review if you post to apache's new reviewboard server: https://reviews.apache.org Regards. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-906: --- Status: Open (was: Patch Available) Cancelling patch, needs test(s). Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client --- Key: ZOOKEEPER-906 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Radu Marin Assignee: Radu Marin Fix For: 3.4.0 Attachments: ZOOKEEPER-906.patch Original Estimate: 24h Remaining Estimate: 24h Currently, when a C client get disconnected, it retries a couple of hosts (not all) with no delay between attempts and then if it doesn't succeed it sleeps for 1/3 session expiration timeout period before trying again. In the worst case the disconnect event can occur after 2/3 of session expiration timeout has past, and sleeping for even more 1/3 session timeout will cause a session loss in most of the times. A better approach is to check all hosts but with random delay between reconnect attempts. Also the delay must be independent of session timeout so if we increase the session timeout we also increase the number of available attempts. This improvement covers the case when the C client experiences network problems for a short period of time and is not able to reach any zookeeper hosts. Java client already uses this logic and works very good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-877) zkpython does not work with python3.1
[ https://issues.apache.org/jira/browse/ZOOKEEPER-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-877: --- Status: Open (was: Patch Available) Hi TuxRacer, is it possible for you to re-submit this as a single patch file? We generally request all changes in that format, to ensure that it's committed the way you intended (also helps with a bunch of other things like reviewing and hudsonqabot, etc...) here are the basic details: http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute (basically: svn checkout the code, make changes, svn diff and submit the result as ZOOKEEPER-877.patch, you may need to svn add if you are adding new files). zkpython does not work with python3.1 - Key: ZOOKEEPER-877 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-877 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1 Environment: linux+python3.1 Reporter: TuxRacer Assignee: TuxRacer Fix For: 3.4.0 Attachments: Doc.tgz, tests_py3k.tgz, zookeeper.c, zookeeper.c.patch.v1, zookeeper.c.patch.v2, zookeeper.c.v2, zookeeper.rst as written in the contrib/zkpython/README file: Python = 2.6 is required. We have tested against 2.6. We have not tested against 3.x. this is probably more a 'new feature' request than a bug; anyway compiling the pythn module and calling it returns an error at load time: python3.1 Python 3.1.2 (r312:79147, May 8 2010, 16:36:46) [GCC 4.4.4] on linux2 Type help, copyright, credits or license for more information. import zookeeper Traceback (most recent call last): File stdin, line 1, in module ImportError: /usr/local/lib/python3.1/dist-packages/zookeeper.so: undefined symbol: PyString_AsString are there any plan to support Python3.X? I also tried to write a 3.1 ctypes wrapper but the C API seems in fact to be written in C++, so python ctypes cannot be used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-874) FileTxnSnapLog.restore does not call listener
[ https://issues.apache.org/jira/browse/ZOOKEEPER-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-874: --- Status: Open (was: Patch Available) Cancelling patch, could you provide a tests that verifies this issue is addressed? Thanks. FileTxnSnapLog.restore does not call listener - Key: ZOOKEEPER-874 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-874 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.3.1 Reporter: Diogo Assignee: Diogo Priority: Trivial Fix For: 3.4.0 Attachments: ZOOKEEPER-874.patch FileTxnSnapLog.restore() does not call listener passed as parameter. The result is that the commitLogs list is empty. When a follower connects to the leader, the leader is forced to send a snapshot to the follower instead of a couple of requests and commits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-823: --- Status: Open (was: Patch Available) cancelling for now given Thomas is working on this via a new avenue. update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: NettyNettySuiteTest.rtf, TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, testDisconnectedAddAuth_FAILURE, testWatchAutoResetWithPending_FAILURE, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-784) server-side functionality for read-only mode
[ https://issues.apache.org/jira/browse/ZOOKEEPER-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-784: --- Status: Open (was: Patch Available) Looks like the patch is failing to apply in one hunk, please resubmit. thanks. [exec] 1 out of 2 hunks FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/Watcher.java.rej server-side functionality for read-only mode Key: ZOOKEEPER-784 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-784 Project: Zookeeper Issue Type: Sub-task Components: server Reporter: Sergey Doroshenko Assignee: Sergey Doroshenko Fix For: 3.4.0 Attachments: ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch, ZOOKEEPER-784.patch As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create ReadOnlyZooKeeperServer which comes into play when peer is partitioned. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-780) zkCli.sh generates a ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-780: --- Status: Open (was: Patch Available) Andrei, this is a good fix, could you create a test for this? Thanks. zkCli.sh generates a ArrayIndexOutOfBoundsException - Key: ZOOKEEPER-780 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-780 Project: Zookeeper Issue Type: Bug Components: scripts Affects Versions: 3.3.1 Environment: Linux Ubuntu running in VMPlayer on top of Windows XP Reporter: Miguel Correia Assignee: Andrei Savu Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-780.patch, ZOOKEEPER-780.patch, ZOOKEEPER-780.patch I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records. I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data. A copy of the screen: [zk: localhost:2181(CONNECTED) 3] create /groups firstgroup Created /groups [zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-740) zkpython leading to segfault on zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-740: --- Status: Open (was: Patch Available) Looks like the patch is failing to apply. Could someone update and resubmit? zkpython leading to segfault on zookeeper - Key: ZOOKEEPER-740 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-740 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.0 Reporter: Federico Assignee: Henry Robinson Priority: Critical Fix For: 3.4.0 Attachments: ZOOKEEPER-740.patch The program that we are implementing uses the python binding for zookeeper but sometimes it crash with segfault; here is the bt from gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xad244b70 (LWP 28216)] 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 2488../Objects/abstract.c: No such file or directory. in ../Objects/abstract.c (gdb) bt #0 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575 #2 0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194) at ../Objects/abstract.c:2480 #3 0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1, path=0x86337c8 , context=0x8588660) at src/c/zookeeper.c:314 #4 0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1, path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:275 #5 deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 , list=0xa5354140) at src/zk_hashtable.c:317 #6 0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766 #7 0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333 #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-756) some cleanup and improvements for zooinspector
[ https://issues.apache.org/jira/browse/ZOOKEEPER-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-756: --- Status: Open (was: Patch Available) The patch is failing to apply, please update against the latest source base and resubmit, thanks! some cleanup and improvements for zooinspector -- Key: ZOOKEEPER-756 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-756 Project: Zookeeper Issue Type: Improvement Components: contrib Affects Versions: 3.3.0 Reporter: Thomas Koch Assignee: Colin Goodheart-Smithe Fix For: 3.4.0 Attachments: zooInspectorChanges.patch Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as Veiwers. Please do a case insensitive search for veiw to correct these. Even the config file defaultNodeVeiwers.cfg is missspelled like this. This has the potential to confuse the hell out of people when debugging something! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client
[ https://issues.apache.org/jira/browse/ZOOKEEPER-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930429#action_12930429 ] Patrick Hunt commented on ZOOKEEPER-906: Hi Radu, yes I agree, difficult to parse in some cases. Nigel/Giri/buildteam are working to improve, WIP. bq. -1 tests included. The patch doesn't appear to include any new or modified tests. this is the main issue - the patch doesn't include any tests validating the modified functionality. bq. -1 on the core tests You can't see it in the summary but if you look at the hudson raw console, near the end, the c tests have failed. (d/l the console and open in editor) This might be a false failure though, I saw similar on another test. I've notified Nigel about it and he's going to take a look. Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client --- Key: ZOOKEEPER-906 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Radu Marin Assignee: Radu Marin Fix For: 3.4.0 Attachments: ZOOKEEPER-906.patch Original Estimate: 24h Remaining Estimate: 24h Currently, when a C client get disconnected, it retries a couple of hosts (not all) with no delay between attempts and then if it doesn't succeed it sleeps for 1/3 session expiration timeout period before trying again. In the worst case the disconnect event can occur after 2/3 of session expiration timeout has past, and sleeping for even more 1/3 session timeout will cause a session loss in most of the times. A better approach is to check all hosts but with random delay between reconnect attempts. Also the delay must be independent of session timeout so if we increase the session timeout we also increase the number of available attempts. This improvement covers the case when the C client experiences network problems for a short period of time and is not able to reach any zookeeper hosts. Java client already uses this logic and works very good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-922: --- Fix Version/s: 3.4.0 Assignee: Camille Fournier Status: Patch Available (was: Open) Marking this as pa so we don't lose it, Camille is asking for f/b. enable faster timeout of sessions in case of unexpected socket disconnect - Key: ZOOKEEPER-922 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.4.0 Attachments: ZOOKEEPER-922.patch In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to minSessionTimeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-921) zkPython interferes with/corrupts Python's 'logging' module
[ https://issues.apache.org/jira/browse/ZOOKEEPER-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-921: --- Fix Version/s: 3.4.0 3.3.3 Assignee: Nicholas Knight zkPython interferes with/corrupts Python's 'logging' module --- Key: ZOOKEEPER-921 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-921 Project: Zookeeper Issue Type: Bug Components: contrib-bindings Affects Versions: 3.3.1, 3.4.0 Environment: Mac OS X 10.6.4, included Python 2.6.1 Reporter: Nicholas Knight Assignee: Nicholas Knight Fix For: 3.3.3, 3.4.0 Attachments: zktest.py Calling {{zookeeper.create()}} seems, under certain circumstances, to be corrupting a subsequent call to Python's {{logging}} module. Specifically, if the node does not exist (but its parent does), I end up with a traceback like this when I try to make the logging call: {noformat} Traceback (most recent call last): File zktest.py, line 21, in module logger.error(Boom?) File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1046, in error if self.isEnabledFor(ERROR): File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1206, in isEnabledFor return level = self.getEffectiveLevel() File /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py, line 1194, in getEffectiveLevel while logger: TypeError: an integer is required {noformat} But if the node already exists, or the parent does not exist, I get the appropriate NodeExists or NoNode exceptions. I'll be attaching a test script that can be used to reproduce this behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-920) L7 (application layer) ping support
[ https://issues.apache.org/jira/browse/ZOOKEEPER-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-920: --- Fix Version/s: 3.4.0 Assignee: Chang Song Hi Chang Song, you need to provide this as a patch file in order for us to consider, see this: http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute basically do a svn diff at the toplevel and attach to this jira. Regards. L7 (application layer) ping support --- Key: ZOOKEEPER-920 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-920 Project: Zookeeper Issue Type: New Feature Components: c client Affects Versions: 3.3.1 Reporter: Chang Song Assignee: Chang Song Priority: Minor Fix For: 3.4.0 Zookeeper is used in applications where fault tolerance is important. Its client i/o thread send/recv heartbeats to/fro Zookeeper ensemble to stay connected. However healthy heartbeat does not always means that the application that uses Zookeeper client is in good health, it only means that ZK client thread is in good health. This I needed something that can tagged onto Zookeeper ping that represents L7 (application) health as well. I have modified C client source to support this in minimal way. I am new to Zookeeper, so please code review this code. I am actually using this code in our in-house solution. https://github.com/tru64ufs/zookeeper/commit/2196d6d5114a2fd2c0a3bc9a55f4494d47d2aece Thank you very much. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation
[ https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-925: --- Description: See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. was: See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Consider maven site generation to replace our forrest site and documentation generation --- Key: ZOOKEEPER-925 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925 Project: Zookeeper Issue Type: Wish Components: documentation Reporter: Patrick Hunt See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-918) Review of BookKeeper Documentation (Sequence flow and failure scenarios)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928697#action_12928697 ] Patrick Hunt commented on ZOOKEEPER-918: There are really two options for docs (today): 1) put it into svn as a forrest doc. typically this is for documentation that's version specific - needs to be versioned along with the code 2) put it into wiki, usually this is non-version specific detail. putting into svn requires a patch for each change, which adds to the overhead. another way to go is to start on the wiki, once the doc is fairly stable move it to svn. Review of BookKeeper Documentation (Sequence flow and failure scenarios) Key: ZOOKEEPER-918 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-918 Project: Zookeeper Issue Type: Task Components: documentation Reporter: Amit Jaiswal Priority: Trivial Fix For: 3.3.3, 3.4.0 Attachments: BookKeeperInternals.doc, BookKeeperInternals.pdf Original Estimate: 2h Remaining Estimate: 2h I have prepared a document describing some of the internals of bookkeeper in terms of: 1. Sequence of operations 2. Files layout 3. Failure scenarios The document is prepared by mostly by reading the code. Can somebody who understands the design review the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-918) Review of BookKeeper Documentation (Sequence flow and failure scenarios)
[ https://issues.apache.org/jira/browse/ZOOKEEPER-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-918: --- Assignee: Amit Jaiswal Review of BookKeeper Documentation (Sequence flow and failure scenarios) Key: ZOOKEEPER-918 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-918 Project: Zookeeper Issue Type: Task Components: documentation Reporter: Amit Jaiswal Assignee: Amit Jaiswal Priority: Trivial Fix For: 3.3.3, 3.4.0 Attachments: BookKeeperInternals.doc, BookKeeperInternals.pdf Original Estimate: 2h Remaining Estimate: 2h I have prepared a document describing some of the internals of bookkeeper in terms of: 1. Sequence of operations 2. Files layout 3. Failure scenarios The document is prepared by mostly by reading the code. Can somebody who understands the design review the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928240#action_12928240 ] Patrick Hunt commented on ZOOKEEPER-917: Sounds like we need more documentation detailing the election process and what expected behavior is. Flavio perhaps you could create a JIRA for that and start collecting this type of information? In particular you could link to jiras of this type, with the intent of general documentation, including detail about these specific types of questions. Leader election selected incorrect leader - Key: ZOOKEEPER-917 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917 Project: Zookeeper Issue Type: Bug Components: leaderElection, server Affects Versions: 3.2.2 Environment: Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny Reporter: Alexandre Hardy Priority: Critical Fix For: 3.3.3, 3.4.0 Attachments: zklogs-20101102144159SAST.tar.gz We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-912) ZooKeeper client logs trace and debug messages at level INFO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928430#action_12928430 ] Patrick Hunt commented on ZOOKEEPER-912: Hi John, thanks for the feedback bq. I've actually had to repackage the zk jar without log4j.xml to tone down the logging. well we don't have a log4j.xml at all, we do have a log4j.properties but that's not included in our primary or maven bin jar files. I just looked again and I don't see it included (at least in 3.3.1) Perhaps this is being pulled in from elsewhere in your environment? We do provide a default log4j.properties in the conf directory of the release artifact. However the intent is for users to either use that directly or customize it based on their needs, which is why it's a config file and not hardcoded. bq. If I could weigh in on this one too... logging levels That's basically what we do. I grant that we skew to pushing more detail at INFO that might be DEBUG, and some DEBUG that should be TRACE. We have been working on that of late (see JIRA, a number of logging level changes went into 3.3). We could push a few more of the info messages to debug, but I'm reticent for a few reasons. Primarily the fact that we often get reports from users who see issues where the only detail we have to go on (esp given this is a complex distributed system) is the info (and higher) level logs. My experience supporting a number of production teams (15+) tells me this would compromise our ability to help them resolve issues. ZooKeeper client logs trace and debug messages at level INFO Key: ZOOKEEPER-912 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-912 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Anthony Urso Priority: Minor Fix For: 3.4.0 Attachments: zk-loglevel.patch ZK logs a lot of uninformative trace and debug messages to level INFO. This fuzzes up everything and makes it easy to miss useful log info. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-912) ZooKeeper client logs trace and debug messages at level INFO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928436#action_12928436 ] Patrick Hunt commented on ZOOKEEPER-912: John could you give some examples of the log messages (ones actually output to the log) that you thought were excessive? (did you mean client or server? both?) It might help to frame the conversation. We might be able to address some of the more egregious ones. Here's a client session (default log4j.properties) where I created a client, ran a few commands, then quit (cli shell): 2010-11-04 17:16:21,319 - INFO [main:zookee...@373] - Initiating client connection, connectString= sessionTimeout=3 watcher=org.apache.zookeeper.zookeepermain$mywatc...@2c6f7ce9 2010-11-04 17:16:21,347 - INFO [main-SendThread():clientcnxn$sendthr...@1000] - Opening socket connection to server localhost/127.0.0.1:2181 2010-11-04 17:16:21,392 - INFO [main-SendThread(localhost:2181):clientcnxn$sendthr...@908] - Socket connection established to localhost/127.0.0.1:2181, initiating session 2010-11-04 17:16:21,486 - INFO [main-SendThread(localhost:2181):clientcnxn$sendthr...@701] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x12c1963f821, negotiated timeout = 3 then nothing until I quit 2010-11-04 17:16:49,401 - INFO [main:zookee...@538] - Session: 0x12c1963f821 closed So the _entirety_ of the logging today is just 4 messages for the client establishment, one for closing the client. During the time that the client has an active session established there are no messages output. I don't see that as excessive personally, but others might not think the same. In my experience these are some very important messages when helping postmortem production failures. I could see where we might drop msgs 1-3 to debug level (keep 4, the established msg), as long as we highlight connection attempts that fail. Having detail about the sessionid and negotiated timeout is pretty critical though, from an informational perspective. ZooKeeper client logs trace and debug messages at level INFO Key: ZOOKEEPER-912 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-912 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Anthony Urso Priority: Minor Fix For: 3.4.0 Attachments: zk-loglevel.patch ZK logs a lot of uninformative trace and debug messages to level INFO. This fuzzes up everything and makes it easy to miss useful log info. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever
[ https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928441#action_12928441 ] Patrick Hunt commented on ZOOKEEPER-914: Hi Vishal we do appreciate your feedback and interest. You've been doing a great job highlighting issues and working to resolve them. Again, thanks. We also feel your frustrations. We wish we had unlimited time and resources to develop and test ZK, unfortunately that's not the case. This is one of the many reasons why we brought the project to Apache, to build community and gain insights of developers and users such as yourself. Is everything done, is it all perfect code? No. However the source is open, the process is open, and we hope that more contributors will sign on to working together and making significant contributions. This doesn't have to be just new features, it very much could be testing (code and QA), documentation and all the other bits that go into useful software. I encourage you to bring your QA related concerns to the larger group. That's something that should be discussed on the dev list rather than here in a jira for a specific issue. As you can see the primary committers work hard to address all the issues found. However there's just not enough of us (and we ourselves work on this in our spare time to varying degrees). Perhaps others will feel similarly and you can work to address some of the deficiencies. I'd *love* to see more unit test and more system testing. If you want to make that happen I'd do my best to support you. Regards. (I'll let Flavio comment on the further specifics of this particular issue) QuorumCnxManager blocks forever Key: ZOOKEEPER-914 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.3, 3.4.0 This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable [0x7fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked 0x7fa93315f988 (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-850) Switch from log4j to slf4j
[ https://issues.apache.org/jira/browse/ZOOKEEPER-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927036#action_12927036 ] Patrick Hunt commented on ZOOKEEPER-850: Hi Olaf, thanks for the patch. A couple questions/comments: 1) can you create this patch against trunk? We only put bug fixes into the fix releases, so this would be slated for 3.4.0 release (current trunk). 2) do any of the shell scripts need to be updated? (bin directory) 3) I see references to log4j in the build.xml file(s). Do any of these need to be upated? It would be good if you could build a release (ant tar) and verify that the built archive can run zk server/client via the bin scripts. 4) It looks like the documentation also needs to be updated, do a egrep -Ri log4j src/docs/src/documentation/ log4j from the toplevel. We should at least update the existing docs, also it would be helpful to include addl information to help both users and developers make the switch. 5) we typically create release notes for a release, it would be good to document in this JIRA (the rel notes section) any details we should include in the release notes documentation that goes along with the release. Some short statement detailing the change an any impact (you've given some detail in the comments, basically wrapping it up into something short/simple for users to follow during upgrade). Thanks! Switch from log4j to slf4j -- Key: ZOOKEEPER-850 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-850 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Olaf Krische Assignee: Olaf Krische Fix For: 3.4.0 Attachments: ZOOKEEPER-3.3.1-log4j-slf4j-20101031.patch.bz2 Hello, i would like to see slf4j integrated into the zookeeper instead of relying explicitly on log4j. slf4j is an abstract logging framework. There are adapters from slf4j to many logger implementations, one of them is log4j. The decision which log engine to use i dont like to make so early. This would help me to embed zookeeper in my own applications (which use a different logger implemenation, but slf4j is the basis) What do you think? (as i can see, those slf4j request flood all other projects on apache as well :-) Maybe for 3.4 or 4.0? I can offer a patchset, i have experience in such an migration already. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-872) Small fixes to PurgeTxnLog
[ https://issues.apache.org/jira/browse/ZOOKEEPER-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927082#action_12927082 ] Patrick Hunt commented on ZOOKEEPER-872: Hi Vishal, I noticed a couple issues. This class is a command line utility. As such we are outputting to both the command line and to the log. The usage() in particular should go to std out so that the user will see it regardless of the log settings (fine if you want to output it to LOG as well, but I think this is unnecessary). good catch on the error handling for this: public static void purge(File dataDir, File snapDir, int num) throws IOException { -if (num 3) { -throw new IllegalArgumentException(count should be greater than 3); +if (num 2) { +throw new IllegalArgumentException(count should be greater than 1); } However the number 3 was chosen to ensure that ppl don't shoot themselves in the foot (if the most recent logs get corrupted we'll fall back to the prior when attempting to recover). There really should be a comment to this effect (would be good to add). I don't know how Mahadev feels on this setting (min 3 vs some other number) but he might have more insight as IIRC he implemented this originally. this following is there to provide feedback to the user when running on command line: -System.out.println(Removing file: + -DateFormat.getDateTimeInstance().format(f.lastModified())+ -\t+f.getPath()); again, regardless of logging setup. Perhaps we should have a -q option that turns off the CLI logging and just logs to the log file? I know this has been an issue previously (stdout/err) given that cron will spitout emails by default containing stdout/err. Also, is there a test for this? If you're up to it would be great to add. Small fixes to PurgeTxnLog --- Key: ZOOKEEPER-872 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-872 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Minor Fix For: 3.4.0 Attachments: ZOOKEEPER-872 PurgeTxnLog forces us to have at least 2 backups (by having count = 3. Also, it prints to stdout instead of using Logger. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-912) ZooKeeper client logs trace and debug messages at level INFO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925719#action_12925719 ] Patrick Hunt commented on ZOOKEEPER-912: bq. Zookeeper logs nearly everything at least at level info, regardless of severity. that's incorrect. I did a quick grep for logging in the main src and see the following: egrep -R LOG\.error src/java/main/. |wc -l 78 egrep -R LOG\.warn src/java/main/. |wc -l 175 egrep -R LOG\.info src/java/main/. |wc -l 127 egrep -R LOG\.debug src/java/main/. |wc -l 114 egrep -R LOG\.trace src/java/main/. |wc -l 28 So actually we log mostly at WARN severity. Perhaps you think this because you mainly see INFO messages, but that's to be expected (typically things work, we only log WARN/ERROR when bad things happen). I didn't say anything about all/nothing. Check the code, we have a number of messages at various levels, incl trace/debug. If you don't want to see the informational messages for a particular class you can configure that. As I pointed out earlier: http://hadoop.apache.org/zookeeper/docs/current/zookeeperInternals.html#sc_logging We consider both the messages you listed to be informational given that we expect/recover from the second. ZooKeeper client logs trace and debug messages at level INFO Key: ZOOKEEPER-912 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-912 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Anthony Urso Priority: Minor Fix For: 3.4.0 Attachments: zk-loglevel.patch ZK logs a lot of uninformative trace and debug messages to level INFO. This fuzzes up everything and makes it easy to miss useful log info. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever
[ https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925843#action_12925843 ] Patrick Hunt commented on ZOOKEEPER-914: Flavio in item 2 you mention mock, consider using mockito, I've had alot of luck with that personally, also Hadoop itself has moved to using this in it's tests. http://code.google.com/p/mockito/ QuorumCnxManager blocks forever Key: ZOOKEEPER-914 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914 Project: Zookeeper Issue Type: Bug Components: leaderElection Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.3, 3.4.0 This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable [0x7fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked 0x7fa93315f988 (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-912) ZooKeeper client logs trace and debug messages at level INFO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925845#action_12925845 ] Patrick Hunt commented on ZOOKEEPER-912: Hi Anthony, I realized in the shower this morning, by Zookeeper did you mean ZooKeeper.java? My bad. I looked at this class again and it does have logging at other levels than just info. Really it should have trace level logs for each of the api entry points. I'm concerned about pushing down the info level logs you highlighted though due to a couple factors; 1) in our experience those msgs are very useful to understand the runtime state of the client, 2) many users don't run in production at trace (and some don't want to run in debug). What's your rule of thumb for what should be logged at the various levels? ZooKeeper client logs trace and debug messages at level INFO Key: ZOOKEEPER-912 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-912 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Anthony Urso Priority: Minor Fix For: 3.4.0 Attachments: zk-loglevel.patch ZK logs a lot of uninformative trace and debug messages to level INFO. This fuzzes up everything and makes it easy to miss useful log info. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-897) C Client seg faults during close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925846#action_12925846 ] Patrick Hunt commented on ZOOKEEPER-897: perhaps we should rely on existing testing for this one, but enter a new jira to refactor the client, specifically to allow testing? (ie a way to inject the helper code w/o needing to edit zookeeper.c directly) C Client seg faults during close Key: ZOOKEEPER-897 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-897 Project: Zookeeper Issue Type: Bug Components: c client Reporter: Jared Cantwell Assignee: Jared Cantwell Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEEPER-897.diff, ZOOKEEPER-897.patch We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x0046234e in check_events (zh=0x6bd480, events=value optimized out) at src/zookeeper.c:1687 #2 0x00462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x00469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x77bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x76f706fd in clone () from /lib/libc.so.6 #6 0x in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(eventsZOOKEEPER_READ) branch executes 3. if (rc 0) branch executes 4. if (zh-input_buffer != zh-primer_buffer) branch executes .in the meantime.. 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end . back to check_events(). 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-805) four letter words fail with latest ubuntu nc.openbsd
[ https://issues.apache.org/jira/browse/ZOOKEEPER-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-805: --- Fix Version/s: (was: 3.3.2) 3.3.3 Not a blocker, pushing to 3.3.3/3.4.0 four letter words fail with latest ubuntu nc.openbsd Key: ZOOKEEPER-805 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-805 Project: Zookeeper Issue Type: Bug Components: documentation, server Affects Versions: 3.3.1, 3.4.0 Reporter: Patrick Hunt Priority: Critical Fix For: 3.3.3, 3.4.0 In both 3.3 branch and trunk echo stat|nc localhost 2181 fails against the ZK server on Ubuntu Lucid Lynx. I noticed this after upgrading to lucid lynx - which is now shipping openbsd nc as the default: OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2) vs nc traditional [v1.10-38] which works fine. Not sure if this is a bug in us or nc.openbsd, but it's currently not working for me. Ugh. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-815) fill in TBDs in overview doc
[ https://issues.apache.org/jira/browse/ZOOKEEPER-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-815: --- Fix Version/s: (was: 3.3.2) 3.3.3 Not a blocker, pushing to 3.3.3/3.4.0. fill in TBDs in overview doc -- Key: ZOOKEEPER-815 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-815 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.1 Reporter: Patrick Hunt Priority: Minor Fix For: 3.3.3, 3.4.0 Funny: Ephemeral nodes are useful when you want to implement [tbd]. there are a few others in that doc that are should really be fixed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-907) Spurious KeeperErrorCode = Session moved messages
[ https://issues.apache.org/jira/browse/ZOOKEEPER-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-907: --- Status: Open (was: Patch Available) Cancelling the patch - still needs a test. Ben could you get back on Vishal's question? (see latest comment) Spurious KeeperErrorCode = Session moved messages --- Key: ZOOKEEPER-907 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-907 Project: Zookeeper Issue Type: Bug Affects Versions: 3.3.1 Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-907.patch, ZOOKEEPER-907.patch_v2 The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:preprequestproces...@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa9 type:sync: cxid:0x6 zxid:0xfffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever
[ https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925546#action_12925546 ] Patrick Hunt commented on ZOOKEEPER-914: Thanks for the bug report. I've yet to find a codebase where I couldn't find what I consider bad programming, so I don't find that a constructive comment. We're happy you've joined community, let's all work together to address these issues. Thanks. bq. points out to lack of failure tests for QuorumCnxManager We can always use more testing. If you want to contribute additional patches just for testing please do so (I'm sure if you talk with Flavio he could give you some good ideas). Notice that there are a number of tests exercising this code already (around 85% coverage), we'd need to figure out some way to simulate network failures and such, which is difficult in my experience: https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk/clover/org/apache/zookeeper/server/quorum/QuorumCnxManager.html QuorumCnxManager blocks forever Key: ZOOKEEPER-914 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914 Project: Zookeeper Issue Type: Bug Reporter: Vishal K Assignee: Vishal K Priority: Blocker This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable [0x7fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked 0x7fa93315f988 (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-914) QuorumCnxManager blocks forever
[ https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-914: --- Component/s: server quorum Fix Version/s: 3.4.0 3.3.3 QuorumCnxManager blocks forever Key: ZOOKEEPER-914 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914 Project: Zookeeper Issue Type: Bug Components: quorum, server Reporter: Vishal K Assignee: Vishal K Priority: Blocker Fix For: 3.3.3, 3.4.0 This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable [0x7fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked 0x7fa93315f988 (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-912) ZooKeeper client logs trace and debug messages at level INFO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-912: --- Fix Version/s: 3.4.0 Assignee: Anthony Urso Issue Type: Improvement (was: Bug) IMO more of an improvement than a bug. See this section on logging for our general guidelines (granted it's a very gray area) http://hadoop.apache.org/zookeeper/docs/current/zookeeperInternals.html#sc_logging I see some issues with this patch, while the original logging does make it a bit more fuzzy some of this is pretty critical information. I'd like to give detailed feedback/discussion though. Apache just opened a reviewboard instance, could you post the patch there for review? https://reviews.apache.org/dashboard/ Also please post patches in ZOOKEEPER-###.patch form (also has detail on how to create the patch file itself): http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute Thanks! ZooKeeper client logs trace and debug messages at level INFO Key: ZOOKEEPER-912 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-912 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Anthony Urso Priority: Minor Fix For: 3.4.0 Attachments: zk-loglevel.patch ZK logs a lot of uninformative trace and debug messages to level INFO. This fuzzes up everything and makes it easy to miss useful log info. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-913) Version parser fails to parse 3.3.2-dev from build.xml.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-913: --- Fix Version/s: 3.4.0 Assignee: Anthony Urso I like the version patch better. However it should be made more general. We've had problems with this before. We should parse X.Y.Zblah where anything after Z (Z expected to be a number) gets put into some additional field (extension?). Version parser fails to parse 3.3.2-dev from build.xml. - Key: ZOOKEEPER-913 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-913 Project: Zookeeper Issue Type: Bug Components: build Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Anthony Urso Priority: Critical Fix For: 3.4.0 Attachments: zk-build.patch, zk-version.patch Cannot build 3.3.1 from release tarball do to VerGen parser inability to parse 3.3.2-dev. version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread main java.lang.NumberFormatException: For input string: 2-dev [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [java] at java.lang.Integer.parseInt(Integer.java:481) [java] at java.lang.Integer.parseInt(Integer.java:514) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-912) ZooKeeper client logs trace and debug messages at level INFO
[ https://issues.apache.org/jira/browse/ZOOKEEPER-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925145#action_12925145 ] Patrick Hunt commented on ZOOKEEPER-912: I have issues with this patch that I detailed on https://reviews.apache.org/r/7/ In particular I think you can get the same effect through log4j.properties configuration. (details in my review feedback). I'm -1 for making these changes. What do other people think, agree/disagree? ZooKeeper client logs trace and debug messages at level INFO Key: ZOOKEEPER-912 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-912 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Anthony Urso Assignee: Anthony Urso Priority: Minor Fix For: 3.4.0 Attachments: zk-loglevel.patch ZK logs a lot of uninformative trace and debug messages to level INFO. This fuzzes up everything and makes it easy to miss useful log info. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-706) large numbers of watches can cause session re-establishment to fail
[ https://issues.apache.org/jira/browse/ZOOKEEPER-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925222#action_12925222 ] Patrick Hunt commented on ZOOKEEPER-706: This is a pretty easy one for someone to fix, and as more users use more watches it would be good to get this addressed. large numbers of watches can cause session re-establishment to fail --- Key: ZOOKEEPER-706 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-706 Project: Zookeeper Issue Type: Bug Components: c client, java client Affects Versions: 3.1.2, 3.2.2, 3.3.0 Reporter: Patrick Hunt Priority: Critical Fix For: 3.4.0 If a client sets a large number of watches the set watches operation during session re-establishment can fail. for example: WARN [NIOServerCxn.Factory:22801:nioserverc...@417] - Exception causing close of session 0xe727001201a4ee7c due to java.io.IOException: Len error 4348380 in this case the client was a web monitoring app and had set both data and child watches on 32k znodes. there are two issues I see here we need to fix: 1) handle this case properly (split up the set watches into multiple calls I guess...) 2) the session should have expired after the timeout. however we seem to consider any message from the client as re-setting the expiration on the server side. Probably we should only consider messages from the client that are sent during an established session, otherwise we can see this situation where the session is not established however the session is not expired either. Perhaps we should create another JIRA for this particular issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924669#action_12924669 ] Patrick Hunt commented on ZOOKEEPER-896: Ben/Mahadev any comments on this approach? Botond, can you add a test for this? Improve C client to support dynamic authentication schemes -- Key: ZOOKEEPER-896 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Botond Hejj Assignee: Botond Hejj Fix For: 3.4.0 Attachments: ZOOKEEPER-896.patch When we started exploring zookeeper for our requirements we found the authentication mechanism is not flexible enough. We want to use kerberos for authentication but using the current API we ran into a few problems. The idea is that we get a kerberos token on the client side and than send that token to the server with a kerberos scheme. A server side authentication plugin can use that token to authenticate the client and also use the token for authorization. We ran into two problems with this approach: 1. A different kerberos token is needed for each different server that client can connect to since kerberos uses mutual authentication. That means when the client acquires this kerberos token it has to know which server it connects to and generate the token according to that. The client currently can't generate a token for a specific server. The token stored in the auth_info is used for all the servers. 2. The kerberos token might have an expiry time so if the client loses the connection to the server and than it tries to reconnect it should acquire a new token. That is not possible currently since the token is stored in auth_info and reused for every connection. The problem can be solved if we allow the client to register a callback for authentication instead a static token. This can be a callback with an argument which passes the current host string. The zookeeper client code could call this callback before it sends the authentication info to the server to get a fresh server specific token. This would solve our problem with the kerberos authentication and also could be used for other more dynamic authentication schemes. The solution could be generalization also for the java client as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-896) Improve C client to support dynamic authentication schemes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-896: --- Assignee: Botond Hejj Improve C client to support dynamic authentication schemes -- Key: ZOOKEEPER-896 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-896 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Botond Hejj Assignee: Botond Hejj Fix For: 3.4.0 Attachments: ZOOKEEPER-896.patch When we started exploring zookeeper for our requirements we found the authentication mechanism is not flexible enough. We want to use kerberos for authentication but using the current API we ran into a few problems. The idea is that we get a kerberos token on the client side and than send that token to the server with a kerberos scheme. A server side authentication plugin can use that token to authenticate the client and also use the token for authorization. We ran into two problems with this approach: 1. A different kerberos token is needed for each different server that client can connect to since kerberos uses mutual authentication. That means when the client acquires this kerberos token it has to know which server it connects to and generate the token according to that. The client currently can't generate a token for a specific server. The token stored in the auth_info is used for all the servers. 2. The kerberos token might have an expiry time so if the client loses the connection to the server and than it tries to reconnect it should acquire a new token. That is not possible currently since the token is stored in auth_info and reused for every connection. The problem can be solved if we allow the client to register a callback for authentication instead a static token. This can be a callback with an argument which passes the current host string. The zookeeper client code could call this callback before it sends the authentication info to the server to get a fresh server specific token. This would solve our problem with the kerberos authentication and also could be used for other more dynamic authentication schemes. The solution could be generalization also for the java client as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-904) super digest is not actually acting as a full superuser
[ https://issues.apache.org/jira/browse/ZOOKEEPER-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-904: --- Fix Version/s: 3.3.2 We should consider this for 3.3.2 as well, or at least 3.3.3 super digest is not actually acting as a full superuser --- Key: ZOOKEEPER-904 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Camille Fournier Assignee: Camille Fournier Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-904.patch The documentation states: New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a super user. In particular no ACL checking occurs for a user authenticated as super. However, if a super user does something like: zk.setACL(/, Ids.READ_ACL_UNSAFE, -1); the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the super authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-905: --- Fix Version/s: 3.4.0 Assignee: Nicholas Harteau enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Reporter: Nicholas Harteau Assignee: Nicholas Harteau Priority: Minor Fix For: 3.4.0 Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.