Patrick: Appreciate your detailed response. I haven't finished work in ZOOKEEPER-1407 :-( So I don't think I have bandwidth to start working on another zookeeper issue.
Near term, if we can find out a way for shell script to detect the absence of particular zookeeper node, rolling-restart.sh can be restored. Otherwise we may need to remove it. FYI As hbase committer, I often need to finish incomplete features such as HBASE-3996. This takes away significant amount of time. Cheers On Tue, Mar 20, 2012 at 9:16 AM, Patrick Hunt <[email protected]> wrote: > On Tue, Mar 20, 2012 at 6:57 AM, Ted Yu <[email protected]> wrote: > > I looked at the patch for ZOOKEEPER-1059 which should have converted the > > NPE to KeeperException.NoNodeException > > > > Why would 'zkcli stat' command return 0 in case hbase master znode > expires ? > > > > Advice is appreciated. > > Hi Ted, sorry to see you're having troubles. I think I see the > disconnect. ZooKeeperMain is first and foremost a user shell. As such > it should not exit unless the quit command is run (or killed > explicitly, etc...). In this case ZOOKEEPER-1059 is fixing a bug in > the shell. It indeed is converting the NPE into a NoNodeException, > which the shell then converts into an error message to the user, and > continues. Prior to this patch the shell was failing on the NPE, which > then generated the non-0 exit from the process. > > Note that trunk has some further improvements along these lines that > you might also run into at some point in the future (3.5+): > > https://issues.apache.org/jira/browse/ZOOKEEPER-271 > https://issues.apache.org/jira/browse/ZOOKEEPER-1391 > https://issues.apache.org/jira/browse/ZOOKEEPER-1307 > > I think what we need is to have a tool that's intended for use both > programmatically and by humans, with more strict requirements about > input, output formatting and command handling, etc... Please see the > work Hartmut has been doing as part of 271 on trunk (3.5.0). Perhaps > we can augment these new classes to also support such a tool. However > it should instead be a true command line tool, rather than an shell. > Would you be available to work on this? > > Patrick > > ps. bigtop is now helping to verify cross project compatibility, it > would be great if you could introduce some hbase tests that would > flag these breakages in future. When bigtop does it's integration (ie > runs the hbase tests using the corresponding version of zk) it would > find these problems. We'd catch it much earlier. Thanks! > > > > FYI Jon filed a JIRA for the issue below which is a blocker for HBase > trunk. > > > > On Tue, Mar 20, 2012 at 12:36 AM, Jonathan Hsieh <[email protected]> > wrote: > > > >> I'm trying to test HBASE-5589 -- to see if I can add an API call to > >> HMasterInterface and do a rolling-restart / upgrade on a live cluster > which > >> lead me down another rabbit hole. > >> > >> I'm wondering how rolling-restart.sh script worked in the past (I can > spend > >> more time setting up an older version to test this, but figured I'd > ask). > >> > >> I'm getting stuck when the bin/rolling-restart.sh tries to wait until > the > >> Master ZNode expires. In this particular case, the script seems to hang > >> there forever (even after the /hbase/master ephemeral node expires). > >> > >> Here's the code in the script: > >> ---- > >> # make sure the master znode has been deleted before continuing > >> zparent=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool > >> zookeeper.znode.parent` > >> if [ "$zparent" == "null" ]; then zparent="/hbase"; fi > >> zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool > >> zookeeper.znode.master` > >> if [ "$zmaster" == "null" ]; then zmaster="master"; fi > >> zmaster=$zparent/$zmaster > >> echo -n "Waiting for Master ZNode ${zmaster} to expire" > >> while bin/hbase zkcli stat $zmaster >/dev/null 2>&1; do > >> echo -n "." > >> sleep 1 > >> done > >> echo #force a newline > >> ---- > >> > >> The problem is that 'bin/hbase zkcli stat /hbase/master ...' seems to > >> always returns with $? == 0 regardless if the znode is present or not > >> present! I've checked with Patrick Hunt (ZK committer) and this the > >> expected behavior. The only non-zero retcodes are for abnormal exits > >> (exceptions thrown) > >> > >> Here's the ZK code I was looking through > >> > >> > https://github.com/apache/zookeeper/blob/release-3.4.3/src/java/main/org/apache/zookeeper/ZooKeeperMain.java#L736 > >> > >> > >> > https://github.com/apache/zookeeper/blob/release-3.4.3/src/java/main/org/apache/zookeeper/ZooKeeper.java#L980 > >> > >> > >> Thoughts? > >> > >> Jon. > >> > >> -- > >> // Jonathan Hsieh (shay) > >> // Software Engineer, Cloudera > >> // [email protected] > >> >
