[ 
https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549789#comment-13549789
 ] 

ramkrishna.s.vasudevan commented on HBASE-7468:
-----------------------------------------------

So found the reason.  As stated in the above comment after rollback we need to 
delete the znode.  Only after the znode deletion happens it is possible to 
remove from RIT.  Only then the disable will be successful. 
In the previous commit, the infinite loops were removed and changed to finite 
loops.  So basically here the 
{code}
       assertFalse("region is still in transition",
            
am.getRegionsInTransition().containsKey(regions.get(0).getRegionInfo().getEncodedName()));
{code}
assertion has failed and it has tried to disable the table which did not 
happen.  
But in the output file attached by Lars the thing is the node deleted event 
never happened at all and i doubt it is because of the session expiry error 
that has come just after the rollback
{code}
2013-01-06 21:49:35,500 WARN  
[Master:0;bunnypig,51009,1357537755267-EventThread] zookeeper.ZKUtil(423): 
hconnection-0x13c138da85b0019 Unable to set watcher on znode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase/master
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
        at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:414)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:188)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:301)
{code}
So my suggestion would be we need to wait till the RIT is removed for the 
SPLITTING znode that happens thro AM.nodeDeleted().  And we should introdue a 
timeout for the test which is missing.  The same testcase does not exist in 
Trunk.
@Lars
Pls provide your thoughts.

                
> TestSplitTransactionOnCluster hangs frequently
> ----------------------------------------------
>
>                 Key: HBASE-7468
>                 URL: https://issues.apache.org/jira/browse/HBASE-7468
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3
>            Reporter: Lars Hofhansl
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: 7468-jstack.txt, 7468-output.zip, 
> TestSplitTransactionOnCluster-jstack.txt
>
>
> This what I saw once in a local build.
> {code}
> java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831)
>         at 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to