[
https://issues.apache.org/jira/browse/HBASE-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090147#comment-13090147
]
ramkrishna.s.vasudevan commented on HBASE-4138:
-----------------------------------------------
@Ted,
I debugged and arrived at some points about test failure. Pls check and
correct me if my analysis is wrong.
-> In all the failure scenarios we can see that the just before the exception
has occured a new connection was formed. The test cases invoke new HTable(),
in which it flows to
{code}HConnectionManager.getConnection(conf);{code}
-> now a new connection is retrieved. The new zookeeper connection tries to
watch the master and root region server node.(MasterAddressTracker.start() and
RootRegionTracker().start()
-> In ZKUtil.watchAndCheckExists() api
{code}
Stat s = zkw.getRecoverableZooKeeper().exists(znode, zkw);
LOG.debug(zkw.prefix("Set watcher on existing znode " + znode));
return s != null ? true : false;
{code}
We were printing the log msg and then returning. If you see the failure logs
this znode has the proper value like /hbase/master. Now if this had returned
true, the next step
in start() api will be to get the data
{code}byte [] data = ZKUtil.getDataAndWatch(watcher, node);{code}
But if there had been some data then the log
{code}
LOG.debug(zkw.prefix("Retrieved " + ((data == null)? 0: data.length) +
{code}
should be present but it is not present and there are no exceptions also.
So ideally what has happened is
{code}ZKUtil.watchAndCheckExists(){code} has returned false. This api will
return false when the node does not exist.
Now what we need to know is in what scenario the node /hbase itself will get
deleted and also what made the new HTable() to create a new connection. (May
be the connection got deleted.)
One more thing we need to add is in HConnectionManager.setupZookeeperTrackers()
{code} masterAddressTracker.start(){code}
if he is not able to establish watch he should throw error. Correct me if am
wrong.
> If zookeeper.znode.parent is not specifed explicitly in Client code then
> HTable object loops continuously waiting for the root region by using /hbase
> as the base node.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-4138
> URL: https://issues.apache.org/jira/browse/HBASE-4138
> Project: HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.90.3
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: HBASE-4138_trunk_1.patch, HBASE-4138_trunk_2.patch,
> HBASE-4138_trunk_3.patch
>
>
> Change the zookeeper.znode.parent property (default is /hbase).
> Now do not specify this change in the client code.
> Use the HTable Object.
> The HTable is not able to find the root region and keeps continuously looping.
> Find the stack trace:
> ====================
> Object.wait(long) line: not available [native method]
> RootRegionTracker(ZooKeeperNodeTracker).blockUntilAvailable(long) line: 122
> RootRegionTracker.waitRootRegionLocation(long) line: 73
> HConnectionManager$HConnectionImplementation.locateRegion(byte[],
> byte[], boolean) line: 578
> HConnectionManager$HConnectionImplementation.locateRegion(byte[],
> byte[]) line: 558
> HConnectionManager$HConnectionImplementation.locateRegionInMeta(byte[],
> byte[], byte[], boolean, Object) line: 687
> HConnectionManager$HConnectionImplementation.locateRegion(byte[],
> byte[], boolean) line: 589
> HConnectionManager$HConnectionImplementation.locateRegion(byte[],
> byte[]) line: 558
> HConnectionManager$HConnectionImplementation.locateRegionInMeta(byte[],
> byte[], byte[], boolean, Object) line: 687
> HConnectionManager$HConnectionImplementation.locateRegion(byte[],
> byte[], boolean) line: 593
> HConnectionManager$HConnectionImplementation.locateRegion(byte[],
> byte[]) line: 558
> HTable.<init>(Configuration, byte[]) line: 171
> HTable.<init>(Configuration, String) line: 145
> HBaseTest.test() line: 45
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira