[ 
https://issues.apache.org/jira/browse/HBASE-15855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289579#comment-15289579
 ] 

Stephen Yuan Jiang commented on HBASE-15855:
--------------------------------------------

[~tedyu], are you running test on branch-1 or other 1.x branch?  I could not 
find the {{testFailedSplit}} test in the master branch.  

I am not sure whether dying RS is part of the test.  If not, then we need to 
figure out why RS dies.  From the name of the test, probably killing the RS is 
part of the test and we do have to wait for SSH to complete.

> TestSplitTransactionOnCluster#testFailedSplit may fail due to pending dead 
> server processing
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15855
>                 URL: https://issues.apache.org/jira/browse/HBASE-15855
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Priority: Minor
>         Attachments: testFailedSplit.err
>
>
> Sometimes TestSplitTransactionOnCluster#testFailedSplit fails with:
> {code}
> java.lang.AssertionError: null
>       at org.junit.Assert.fail(Assert.java:86)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at org.junit.Assert.assertTrue(Assert.java:52)
>       at 
> org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testFailedSplit(TestSplitTransactionOnCluster.java:1339)
> {code}
> Here was the reason:
> {code}
> 2016-05-12 14:38:31,022 INFO  
> [RS:5;c66-slave-20176e25-5:40721-splits-1463063900610] 
> regionserver.SplitRequest(143): Split transaction journal:
>       STARTED at 1463063910621
>       PREPARED at 1463063910628
>       BEFORE_PRE_SPLIT_HOOK at 1463063910628
>       AFTER_PRE_SPLIT_HOOK at 1463063910628
>       SET_SPLITTING at 1463063910632
>       CREATE_SPLIT_DIR at 1463063910743
>       CLOSED_PARENT_REGION at 1463063910768
>       OFFLINED_PARENT at 1463063910768
>       STARTED_REGION_A_CREATION at 1463063910839
>       STARTED_REGION_B_CREATION at 1463063910889
> 2016-05-12 14:38:31,023 DEBUG [Thread-1689-EventThread] 
> zookeeper.ZooKeeperWatcher(511): hbase-admin-on-hconnection-0x9755dd10x0, 
> quorum=localhost:49482, baseZNode=/hbase Received ZooKeeper Event, type=None, 
> state=SyncConnected, path=null
> 2016-05-12 14:38:31,025 DEBUG [Thread-1689-EventThread] 
> zookeeper.ZooKeeperWatcher(574): 
> hbase-admin-on-hconnection-0x9755dd1-0x154a566b247001f connected
> 2016-05-12 14:38:31,052 DEBUG 
> [B.defaultRpcServer.handler=3,queue=0,port=54033] master.HMaster(1373): Not 
> running balancer because processing dead regionserver(s): 
> c66-slave-20176e25-5.novalocal,49562,1463063863793
> 2016-05-12 14:38:31,054 INFO  [Thread-1689] 
> client.ConnectionManager$HConnectionImplementation(1684): Closing zookeeper 
> sessionid=0x154a566b247001c
> {code}
> We should account for dead server processing before making the assertion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to