That's great news, glad it worked out. Thanks for the update Ted.

Patrick

On Tue, Mar 20, 2012 at 3:33 PM, Ted Yu <[email protected]> wrote:
> We're using the trick Patrick proposed, see:
> https://issues.apache.org/jira/browse/HBASE-5603
>
> FYI
>
> On Tue, Mar 20, 2012 at 10:14 AM, Patrick Hunt <[email protected]> wrote:
>
>> Great. Thanks Ted.
>>
>> On Tue, Mar 20, 2012 at 10:09 AM, Ted Yu <[email protected]> wrote:
>> > Patrick:
>> > I logged https://issues.apache.org/jira/browse/ZOOKEEPER-1428
>> >
>> > If you feel there is anything missing in the JIRA, feel free to add it.
>> >
>> > Thanks for your help on this issue.
>> >
>> > Cheers
>> >
>> > On Tue, Mar 20, 2012 at 9:42 AM, Patrick Hunt <[email protected]> wrote:
>> >
>> >> On Tue, Mar 20, 2012 at 9:32 AM, Ted Yu <[email protected]> wrote:
>> >> > Near term, if we can find out a way for shell script to detect the
>> >> absence
>> >> > of particular zookeeper node, rolling-restart.sh can be restored.
>> >> > Otherwise we may need to remove it.
>> >>
>> >> I just tested this out with 3.4, and I see the following for statting
>> >> a non-existant znode:
>> >>
>> >> [zk: (CONNECTED) 1] stat /foobar
>> >> Node does not exist: /foobar
>> >>
>> >> vs statting one that does exist:
>> >>
>> >> [zk: (CONNECTED) 2] stat /
>> >> cZxid = 0x0
>> >> ctime = Wed Dec 31 16:00:00 PST 1969
>> >> mZxid = 0x0
>> >> mtime = Wed Dec 31 16:00:00 PST 1969
>> >> pZxid = 0x0
>> >> cversion = -1
>> >> dataVersion = 0
>> >> aclVersion = 0
>> >> ephemeralOwner = 0x0
>> >> dataLength = 0
>> >> numChildren = 1
>> >>
>> >> You can look for "^Node does not exist" in the stat output instead of
>> >> checking the exit code. This would get around the problem until a more
>> >> permanent solution could be found.
>> >>
>> >> I hear you re time bound (i'd love to work on this myself). In that
>> >> case, would you mind creating a jira based on my suggestion of having
>> >> a new command line tool, give your hbase case as an example and any
>> >> requirements you might think of. Perhaps Hartmut or one of the other
>> >> contributors might be interested to work on this.
>> >> https://issues.apache.org/jira/browse/ZOOKEEPER
>> >>
>> >> Patrick
>> >>
>> >> >
>> >> > On Tue, Mar 20, 2012 at 9:16 AM, Patrick Hunt <[email protected]>
>> wrote:
>> >> >
>> >> >> On Tue, Mar 20, 2012 at 6:57 AM, Ted Yu <[email protected]> wrote:
>> >> >> > I looked at the patch for ZOOKEEPER-1059 which should have
>> converted
>> >> the
>> >> >> > NPE to KeeperException.NoNodeException
>> >> >> >
>> >> >> > Why would 'zkcli stat' command return 0 in case hbase master znode
>> >> >> expires ?
>> >> >> >
>> >> >> > Advice is appreciated.
>> >> >>
>> >> >> Hi Ted, sorry to see you're having troubles. I think I see the
>> >> >> disconnect. ZooKeeperMain is first and foremost a user shell. As such
>> >> >> it should not exit unless the quit command is run (or killed
>> >> >> explicitly, etc...). In this case ZOOKEEPER-1059 is fixing a bug in
>> >> >> the shell. It indeed is converting the NPE into a NoNodeException,
>> >> >> which the shell then converts into an error message to the user, and
>> >> >> continues. Prior to this patch the shell was failing on the NPE,
>> which
>> >> >> then generated the non-0 exit from the process.
>> >> >>
>> >> >> Note that trunk has some further improvements along these lines that
>> >> >> you might also run into at some point in the future (3.5+):
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-271
>> >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-1391
>> >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-1307
>> >> >>
>> >> >> I think what we need is to have a tool that's intended for use both
>> >> >> programmatically and by humans, with more strict requirements about
>> >> >> input, output formatting and command handling, etc... Please see the
>> >> >> work Hartmut has been doing as part of 271 on trunk (3.5.0). Perhaps
>> >> >> we can augment these new classes to also support such a tool. However
>> >> >> it should instead be a true command line tool, rather than an shell.
>> >> >> Would you be available to work on this?
>> >> >>
>> >> >> Patrick
>> >> >>
>> >> >> ps. bigtop is now helping to verify cross project compatibility, it
>> >> >> would be great if you could introduce some hbase tests  that would
>> >> >> flag these breakages in future. When bigtop does it's integration (ie
>> >> >> runs the hbase tests using the corresponding version of zk) it would
>> >> >> find these problems. We'd catch it much earlier. Thanks!
>> >> >>
>> >> >>
>> >> >> > FYI Jon filed a JIRA for the issue below which is a blocker for
>> HBase
>> >> >> trunk.
>> >> >> >
>> >> >> > On Tue, Mar 20, 2012 at 12:36 AM, Jonathan Hsieh <[email protected]
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> >> I'm trying to test HBASE-5589 -- to see if I can add an API call
>> to
>> >> >> >> HMasterInterface and do a rolling-restart / upgrade on a live
>> cluster
>> >> >> which
>> >> >> >> lead me down another rabbit hole.
>> >> >> >>
>> >> >> >> I'm wondering how rolling-restart.sh script worked in the past (I
>> can
>> >> >> spend
>> >> >> >> more time setting up an older version to test this, but figured
>> I'd
>> >> >> ask).
>> >> >> >>
>> >> >> >> I'm getting stuck when the bin/rolling-restart.sh tries to wait
>> until
>> >> >> the
>> >> >> >> Master ZNode expires.  In this particular case, the script seems
>> to
>> >> hang
>> >> >> >> there forever (even after the /hbase/master ephemeral node
>> expires).
>> >> >> >>
>> >> >> >> Here's the code in the script:
>> >> >> >> ----
>> >> >> >> # make sure the master znode has been deleted before continuing
>> >> >> >>    zparent=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
>> >> >> >> zookeeper.znode.parent`
>> >> >> >>    if [ "$zparent" == "null" ]; then zparent="/hbase"; fi
>> >> >> >>    zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
>> >> >> >> zookeeper.znode.master`
>> >> >> >>    if [ "$zmaster" == "null" ]; then zmaster="master"; fi
>> >> >> >>    zmaster=$zparent/$zmaster
>> >> >> >>    echo -n "Waiting for Master ZNode ${zmaster} to expire"
>> >> >> >>    while bin/hbase zkcli stat $zmaster >/dev/null 2>&1; do
>> >> >> >>      echo -n "."
>> >> >> >>      sleep 1
>> >> >> >>    done
>> >> >> >>    echo #force a newline
>> >> >> >> ----
>> >> >> >>
>> >> >> >> The problem is that 'bin/hbase zkcli stat /hbase/master ...'
>> seems to
>> >> >> >> always returns with $? == 0 regardless if the znode is present or
>> not
>> >> >> >> present!  I've checked with Patrick Hunt (ZK committer) and this
>> the
>> >> >> >> expected behavior.  The only non-zero retcodes are for abnormal
>> exits
>> >> >> >> (exceptions thrown)
>> >> >> >>
>> >> >> >> Here's the ZK code I was looking through
>> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/zookeeper/blob/release-3.4.3/src/java/main/org/apache/zookeeper/ZooKeeperMain.java#L736
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/zookeeper/blob/release-3.4.3/src/java/main/org/apache/zookeeper/ZooKeeper.java#L980
>> >> >> >>
>> >> >> >>
>> >> >> >> Thoughts?
>> >> >> >>
>> >> >> >> Jon.
>> >> >> >>
>> >> >> >> --
>> >> >> >> // Jonathan Hsieh (shay)
>> >> >> >> // Software Engineer, Cloudera
>> >> >> >> // [email protected]
>> >> >> >>
>> >> >>
>> >>
>>

Reply via email to