[ 
https://issues.apache.org/jira/browse/HBASE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552539#comment-13552539
 ] 

ramkrishna.s.vasudevan commented on HBASE-7468:
-----------------------------------------------

I got the issue.  I am able to reproduce this
See the logs
{code}
2013-01-14 14:37:21,760 INFO  [main] regionserver.SplitTransaction(216): 
Starting split of region 
testShouldClearRITWhenNodeFoundInSplittingState,,1358154439514.a9e57d09c58b3ef3b949d602232fb2c2.

2013-01-14 14:37:21,760 DEBUG [main] regionserver.SplitTransaction(871): 
regionserver:61665-0x13c384e4e4f0002 Creating ephemeral node for 
a9e57d09c58b3ef3b949d602232fb2c2 in SPLITTING state

2013-01-14 14:37:21,844 DEBUG [main] zookeeper.ZKAssign(757): 
regionserver:61665-0x13c384e4e4f0002 Attempting to transition node 
a9e57d09c58b3ef3b949d602232fb2c2 from RS_ZK_REGION_SPLITTING to 
RS_ZK_REGION_SPLITTING

2013-01-14 14:37:21,849 DEBUG [Thread-873-EventThread] 
zookeeper.ZooKeeperWatcher(277): master:62334-0x13c384e4e4f001b Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, 
path=/hbase/unassigned

2013-01-14 14:37:21,853 DEBUG [main] zookeeper.ZKUtil(1565): 
regionserver:61665-0x13c384e4e4f0002 Retrieved 140 byte(s) of data from znode 
/hbase/unassigned/a9e57d09c58b3ef3b949d602232fb2c2; 
data=region=testShouldClearRITWhenNodeFoundInSplittingState,,1358154439514.a9e57d09c58b3ef3b949d602232fb2c2.,
 origin=Ram.Home,61665,1358154325430, state=RS_ZK_REGION_SPLITTING

2013-01-14 14:37:21,918 DEBUG [main] zookeeper.ZKAssign(820): 
regionserver:61665-0x13c384e4e4f0002 Successfully transitioned node 
a9e57d09c58b3ef3b949d602232fb2c2 from RS_ZK_REGION_SPLITTING to 
RS_ZK_REGION_SPLITTING

2013-01-14 14:37:21,919 DEBUG [Thread-873-EventThread] zookeeper.ZKUtil(417): 
master:62334-0x13c384e4e4f001b Set watcher on existing znode 
/hbase/unassigned/a9e57d09c58b3ef3b949d602232fb2c2
{code}
Here we can observe that the SPLITTING node was first created.  Then we transit 
it to SPLITTING to SPLITTING so that AM can have the nodeDataChange event. But 
for the nodeDataChange event to happen first nodeChildrenChange event should 
happen so that the master can set a watcher on the node.

Now when this hang happens, we can see that after the transition happens only 
then the watcher is set by nodeChildrenChange event and so the SPLITTING to 
SPLITTING event itself is missed or skipped.  
                
> TestSplitTransactionOnCluster hangs frequently
> ----------------------------------------------
>
>                 Key: HBASE-7468
>                 URL: https://issues.apache.org/jira/browse/HBASE-7468
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3
>            Reporter: Lars Hofhansl
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.94.5
>
>         Attachments: 7468-0.94.txt, 7468-0.94-v2.txt, 7468-0.94-v4.txt, 
> 7468-jstack.txt, 7468-output.zip, HBASE-7468v3.patch, 
> TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml, 
> TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml, 
> TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml, 
> TestSplitTransactionOnCluster-jstack.txt
>
>
> This what I saw once in a local build.
> {code}
> java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:831)
>         at 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState(TestSplitTransactionOnCluster.java:650)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to