[jira] Commented: (ZOOKEEPER-759) Stop accepting connections when close to file descriptor limit
[ https://issues.apache.org/jira/browse/ZOOKEEPER-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862303#action_12862303 ] Ted Dunning commented on ZOOKEEPER-759: --- This is a unix specific bean so don't forget to defang the test if the bean isn't available. Stop accepting connections when close to file descriptor limit -- Key: ZOOKEEPER-759 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-759 Project: Zookeeper Issue Type: Improvement Components: server Reporter: Travis Crawford Zookeeper always tries to accept new connections, throwing an exception if out of file descriptors. An improvement would be denying new client connections when close to the limit. Additionally, file-descriptor limits+usage should be exported to the monitoring four-letter word, should that get implemented (see ZOOKEEPER-744). DETAILS A Zookeeper ensemble I administer recently suffered an outage when one node was restarted with the low system-default ulimit of 1024 file descriptors and later ran out. File descriptor usage+max are already being monitored by the following MBeans: - java.lang.OperatingSystem.MaxFileDescriptorCount - java.lang.OperatingSystem.OpenFileDescriptorCount They're described (rather tersely) at: http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html This feature request is for the following: (a) Stop accepting new connections when OpenFileDescriptorCount is close to MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be denied, logged to disk at debug level, and increment a ``ConnectionDeniedCount`` MBean counter. (b) Begin accepting new connections when usage drops below some configurable threshold, defaulting to 90% of FD usage, basically the high/low watermark model. (c) Update the administrators guide with a comment about using an appropriate FD limit. (d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for: zookeeper_open_file_descriptor_count zookeeper_max_file_descriptor_count zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not all zk's have the same max FD value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover
[ https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772020#action_12772020 ] Ted Dunning commented on ZOOKEEPER-22: -- Is there progress on this issue? Automatic request retries on connect failover - Key: ZOOKEEPER-22 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22 Project: Zookeeper Issue Type: New Feature Components: c client, java client Reporter: Patrick Hunt Assignee: Mahadev konar Fix For: 3.3.0 Moved from SourceForge to Apache. http://sourceforge.net/tracker/index.php?func=detailaid=1831412group_id=209147atid=1008547 When a connection to a ZooKeeper server fails, all of the pending requests will return an error. In reality the requests should be resubmitted when the client reestablishes a connection to ZooKeeper. For read requests, it's no big deal to just reissue the request. For update requests, the ZooKeeper must be able to detect if the request has been processed and, if so, return the result of the previous execution; otherwise, it should process the request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-22) Automatic request retries on connect failover
[ https://issues.apache.org/jira/browse/ZOOKEEPER-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12772165#action_12772165 ] Ted Dunning commented on ZOOKEEPER-22: -- I wouldn't call it laziness. At most distraction. But a lot of ZK users will breathe a sigh of relief when this fix gets deployed! Thanks for your efforts on this. Automatic request retries on connect failover - Key: ZOOKEEPER-22 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-22 Project: Zookeeper Issue Type: New Feature Components: c client, java client Reporter: Patrick Hunt Assignee: Mahadev konar Fix For: 3.3.0 Moved from SourceForge to Apache. http://sourceforge.net/tracker/index.php?func=detailaid=1831412group_id=209147atid=1008547 When a connection to a ZooKeeper server fails, all of the pending requests will return an error. In reality the requests should be resubmitted when the client reestablishes a connection to ZooKeeper. For read requests, it's no big deal to just reissue the request. For update requests, the ZooKeeper must be able to detect if the request has been processed and, if so, return the result of the previous execution; otherwise, it should process the request. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-556) Startup messages should account for common error of missing leading slash in config files
Startup messages should account for common error of missing leading slash in config files - Key: ZOOKEEPER-556 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-556 Project: Zookeeper Issue Type: Bug Reporter: Ted Dunning It would be nice if the startup noticed directories without a leading slash in the config file. That is worth a warning. Moreover, if that directory exists looking from root, but not looking from the current directory, a very serious warning is in order. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-556) Startup messages should account for common error of missing leading slash in config files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768937#action_12768937 ] Ted Dunning commented on ZOOKEEPER-556: --- In the following exchange, a new Zookeeper user lost several days wrestling with this issue. Henry spotted the problem. I didn't. Patrick didn't. With a prominent error message, the user would have found this in 5 minutes. {noformat} yeah - thought this was it: you've missed the forward slash on home/mark/zookeeper (this turned up on your exception message). On Thu, Oct 22, 2009 at 2:55 PM, Mark Vigeant mark.vige...@riskmetrics.comwrote: Yeah I just figured out the problem with zoocfg.py I am running as the same user who created myid. Here's my config: zoo.cfg tickTime-2000 dataDir=home/mark/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1= hermes:2888:3888 server.2= leela:2888:3888 on the machines hermes and leela I've put myid files in /home/mark/zookeeper with the numbers 1 and 2 respectively {noformat} Startup messages should account for common error of missing leading slash in config files - Key: ZOOKEEPER-556 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-556 Project: Zookeeper Issue Type: Bug Reporter: Ted Dunning It would be nice if the startup noticed directories without a leading slash in the config file. That is worth a warning. Moreover, if that directory exists looking from root, but not looking from the current directory, a very serious warning is in order. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: feedback zkclient
I think that another way to say this is that zkClient is going a bit for the Spring philosophy that if the caller can't (or won't) be handling the situation, then they shouldn't be forced to declare it. The Spring jdbcTemplate is a grand example of the benefits of this. First implementations of this policy generally are a bit too broad, though, so this should be examined carefully. On Thu, Oct 1, 2009 at 8:05 AM, Peter Voss i...@petervoss.org wrote: 5) there's a lot of wrapping of exceptions, looks like this is done in order to make them unchecked. Is this wise? How much simpler does it really make things? Esp things like interrupted exception? As you mentioned, one of your intents is to simplify things, but perhaps too simple? Some short, clear examples of usage would be helpful here to compare/contrast, I took a very quick look at some of the tests but that didn't help much. Is there a test(s) in particular that I should look at to see how zkclient is used, and the benefits incurred? Checked exceptions are very painful when you are assembling together a larger number of libraries (which is true for most enterprise applications). Either you wind up having a general throws Exception (which I don't really like, because it's too general) at most of your interfaces, or you have to wrap checked exceptions into runtime exceptions. We didn't want a library to introduce yet another checked exception that you MUST catch or rethrow. I know that there are different opinions about that, but that's the idea behind this. Similar situation for the InterruptedException. ZkClient also converts this to a runtime exception and makes sure that the interrupted flag doesn't get cleared. There are just too many existing libraries that have a catch (Exception e) somewhere that totally ignores that this would reset the interrupt flag, if e is an InterruptedException. Therefore we better avoid having all of the methods throwing that exception. -- Ted Dunning, CTO DeepDyve
Re: feedback zkclient
There is not much way to totally avoid this without massive performance loss because the connection loss could be during the the time that the confirmation is returning. You may be able to tell if the file is yours be examining the content and ownership, but this is pretty implementation dependent. In particular, it makes queues very difficult to implement correctly. If this happens during the creation of an ephemeral file, the only option may be to close the connection (thus deleting all ephemeral files) and start over. On Thu, Oct 1, 2009 at 8:05 AM, Peter Voss i...@petervoss.org wrote: 3) there's definitely an issue in the retryUntilConnected logic that you need to address let's say you call zkclient.create, and the connection to the server is lost while the request is in flight. At this point ConnectionLoss is thrown on the client side, however you (client) have no information on whether the server has made the change or not. The retry method's while loop will re-run the create (after reconnect), and the result seen by the caller (user code) could be either OK or may be NODEEXISTS exception, there's no way to know which. Mahadev is working on ZOOKEEPER-22 which will address this issue, but that's a future version, not today. Good catch. I wasn't aware that nodes could still be have been created when receiving a ConnectionLoss. But how would you deal with that? If we create a znode and get a ConnectionLoss exception, then wait until the connection is back and check if the znode is there. There is no way of knowing whether it was us who created the node or somebody else, right? -- Ted Dunning, CTO DeepDyve
Re: feedback zkclient
That looks really lovely. Judging by history and that fact that only 40/127 issues are resolved, 3.3 is probably 3-6 months away. Is that a fair assessment? On Thu, Oct 1, 2009 at 11:13 AM, Patrick Hunt ph...@apache.org wrote: One nice thing about ephemeral is that the Stat contains the owner sessionid. As you say, it's highly implementation dependent. It's also something we recognize is a problem for users, we've slated it for 3.3.0 http://issues.apache.org/jira/browse/ZOOKEEPER-22 -- Ted Dunning, CTO DeepDyve
Re: Show your ZooKeeper pride!
How come Yahoo isn't listed? On Mon, Jun 8, 2009 at 6:31 PM, Patrick Hunt ph...@apache.org wrote: The Hadoop summit is Wednesday. If you're attending please feel free to say hi -- Mahadev is presenting @4, Ben and I will be attending as well. Also, regardless of whether you're attending or not we'd appreciate any updates to the powered by page, if you're too busy to update it yourself send us a snippet and we'll update it for you ;-) http://wiki.apache.org/hadoop/ZooKeeper/PoweredBy Regards, Patrick -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)
[jira] Created: (ZOOKEEPER-418) Need nifty zookeeper browser
Need nifty zookeeper browser Key: ZOOKEEPER-418 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-418 Project: Zookeeper Issue Type: Bug Reporter: Ted Dunning It would be very nice to have a browser that would allow the state of a Zoo to be examined. Even nice would be such a utility that showed changes in real time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-418) Need nifty zookeeper browser
[ https://issues.apache.org/jira/browse/ZOOKEEPER-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated ZOOKEEPER-418: -- Attachment: zk-view-0.1.tgz Here is a first stab at recreating our internal tool with nice upgrades like real-time updates for file and directory contents. I have never built any swing UI's before so there are bound to be infelicities galore. Please help. There are some warts, 1) you can't open a file that has children. 2) opening non-text files is bad juju 3) There seems to be a problem with the way the watchers are glued in place. If you create a file, it appears, but if you create children for it, it doesn't turn into a folder. Work-around is to simply restart the browser. Need nifty zookeeper browser Key: ZOOKEEPER-418 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-418 Project: Zookeeper Issue Type: Bug Reporter: Ted Dunning Attachments: zk-view-0.1.tgz It would be very nice to have a browser that would allow the state of a Zoo to be examined. Even nice would be such a utility that showed changes in real time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-418) Need nifty zookeeper browser
[ https://issues.apache.org/jira/browse/ZOOKEEPER-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated ZOOKEEPER-418: -- Attachment: screenshot-1.jpg Here is a simple example on a live ZK. Need nifty zookeeper browser Key: ZOOKEEPER-418 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-418 Project: Zookeeper Issue Type: Bug Reporter: Ted Dunning Attachments: screenshot-1.jpg, zk-view-0.1.tgz It would be very nice to have a browser that would allow the state of a Zoo to be examined. Even nice would be such a utility that showed changes in real time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.