[
https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549789#comment-13549789
]
ramkrishna.s.vasudevan commented on HBASE-7468:
-----------------------------------------------
So found the reason. As stated in the above comment after rollback we need to
delete the znode. Only after the znode deletion happens it is possible to
remove from RIT. Only then the disable will be successful.
In the previous commit, the infinite loops were removed and changed to finite
loops. So basically here the
{code}
assertFalse("region is still in transition",
am.getRegionsInTransition().containsKey(regions.get(0).getRegionInfo().getEncodedName()));
{code}
assertion has failed and it has tried to disable the table which did not
happen.
But in the output file attached by Lars the thing is the node deleted event
never happened at all and i doubt it is because of the session expiry error
that has come just after the rollback
{code}
2013-01-06 21:49:35,500 WARN
[Master:0;bunnypig,51009,1357537755267-EventThread] zookeeper.ZKUtil(423):
hconnection-0x13c138da85b0019 Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
at
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:414)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:188)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:301)
{code}
So my suggestion would be we need to wait till the RIT is removed for the
SPLITTING znode that happens thro AM.nodeDeleted(). And we should introdue a
timeout for the test which is missing. The same testcase does not exist in
Trunk.
@Lars
Pls provide your thoughts.
> TestSplitTransactionOnCluster hangs frequently
> ----------------------------------------------
>
> Key: HBASE-7468
> URL: https://issues.apache.org/jira/browse/HBASE-7468
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.3
> Reporter: Lars Hofhansl
> Assignee: ramkrishna.s.vasudevan
> Attachments: 7468-jstack.txt, 7468-output.zip,
> TestSplitTransactionOnCluster-jstack.txt
>
>
> This what I saw once in a local build.
> {code}
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831)
> at
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira