[ 
https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283058#comment-16283058
 ] 

Appy commented on HBASE-19457:
------------------------------

Findings from today:
- Checked ~10 runs, these two are failing in all.
org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure.testRecoveryAndDoubleExecutionPreserveSplits
org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure.testSimpleTruncatePreserveSplits

- Tried debugging testSimpleTruncatePreserveSplits (since it's simpler test), 
dug around, added a bunch of logging messages (will commit shortly), in the 
end, it's not even the culprit. It finished perfectly, but then times out 
cleaning up tables in @After TestTableDDLProcedureBase#teadDown.
- More importantly, table testRecoveryAndDoubleExecutionPreserveSplits is the 
issue. 
{noformat}
2017-12-06 22:42:44,216 DEBUG [ProcExecWrkr-1] 
procedure.TruncateTableProcedure(134): truncate 
'testSimpleTruncatePreserveSplits' completed
2017-12-06 22:42:44,260 INFO  [ProcExecWrkr-1] 
procedure2.ProcedureExecutor(1246): Finished pid=99, state=SUCCESS; 
TruncateTableProcedure (table=testSimpleTruncatePreserveSplits 
preserveSplits=true) in 2.0060sec
2017-12-06 22:42:44,508 INFO  [Time-limited test] hbase.Waiter(189): Waiting up 
to [60,000] milli-secs(wait.for.ratio=[1])
2017-12-06 22:42:44,515 INFO  [Time-limited test] hbase.Waiter(189): Waiting up 
to [60,000] milli-secs(wait.for.ratio=[1])
2017-12-06 22:42:44,520 DEBUG [Time-limited test] util.CommonFSUtils(736): 
Current file system:
2017-12-06 22:42:44,521 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|-.tabledesc/
2017-12-06 22:42:44,522 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|----.tableinfo.0000000001
2017-12-06 22:42:44,522 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|-.tmp/
2017-12-06 22:42:44,523 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|-1c6cd8294a789f4998b71741869e1aa7/
2017-12-06 22:42:44,524 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|----.regioninfo
2017-12-06 22:42:44,524 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----f1/
2017-12-06 22:42:44,524 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----f2/
2017-12-06 22:42:44,525 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----recovered.edits/
2017-12-06 22:42:44,526 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|-------2.seqid
2017-12-06 22:42:44,526 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|-5bd5770f4cb5fda71c84441d6be2c0e7/
2017-12-06 22:42:44,527 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|----.regioninfo
2017-12-06 22:42:44,527 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----f1/
2017-12-06 22:42:44,527 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----f2/
2017-12-06 22:42:44,528 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----recovered.edits/
2017-12-06 22:42:44,529 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|-------2.seqid
2017-12-06 22:42:44,529 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|-c24958fa636e29e21028e350d32623fb/
2017-12-06 22:42:44,530 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|----.regioninfo
2017-12-06 22:42:44,530 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----f1/
2017-12-06 22:42:44,530 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----f2/
2017-12-06 22:42:44,531 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----recovered.edits/
2017-12-06 22:42:44,532 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|-------2.seqid
2017-12-06 22:42:44,532 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|-e9f2e3d66bf51aa5d73745df63badf9a/
2017-12-06 22:42:44,533 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|----.regioninfo
2017-12-06 22:42:44,533 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----f1/
2017-12-06 22:42:44,533 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----f2/
2017-12-06 22:42:44,534 DEBUG [Time-limited test] util.CommonFSUtils(754): 
|----recovered.edits/
2017-12-06 22:42:44,535 DEBUG [Time-limited test] util.CommonFSUtils(757): 
|-------2.seqid
2017-12-06 22:42:44,556 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=43202] ipc.CallRunner(141): 
callId: 233 service: ClientService methodName: Scan size: 138 connection: 
67.195.81.155:48320 deadline: 1512600224556

.....some scanner errors for NotServingRegionException, but then retry 
passes....

2017-12-06 22:42:44,998 DEBUG [Time-limited test] client.ClientScanner(241): 
Advancing internal scanner to startKey at 'a', inclusive
2017-12-06 22:42:45,000 DEBUG [Time-limited test] client.ClientScanner(241): 
Advancing internal scanner to startKey at 'b', inclusive
2017-12-06 22:42:45,002 DEBUG [Time-limited test] client.ClientScanner(241): 
Advancing internal scanner to startKey at 'c', inclusive
2017-12-06 22:42:45,005 WARN  [Time-limited test] 
procedure2.ProcedureTestingUtility(146): Set Kill before store update to: false
2017-12-06 22:42:45,012 INFO  [Time-limited test] 
procedure.TestTableDDLProcedureBase(67): Tear down, remove 
table=testRecoveryAndDoubleExecutionPreserveSplits
2017-12-06 22:42:45,013 INFO  [Time-limited test] client.HBaseAdmin$15(895): 
Started disable of testRecoveryAndDoubleExecutionPreserveSplits
2017-12-06 22:42:45,014 INFO  
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318] 
master.HMaster$8(2284): Client=jenkins//67.195.81.155 disable 
testRecoveryAndDoubleExecutionPreserveSplits
2017-12-06 22:42:45,103 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318] 
procedure2.ProcedureExecutor(868): Stored pid=104, 
state=RUNNABLE:DISABLE_TABLE_PREPARE; DisableTableProcedure 
table=testRecoveryAndDoubleExecutionPreserveSplits
2017-12-06 22:42:45,105 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318] 
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:45,207 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318] 
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:45,409 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318] 
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:45,711 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318] 
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:46,202 INFO  [ReadOnlyZKClient] 
zookeeper.ReadOnlyZKClient(306): 0x5adb9ac4 no activities for 60000 ms, close 
active connection. Will reconnect next time when there are new requests.
2017-12-06 22:42:46,213 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318] 
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:47,215 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318] 
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
{noformat}

Onto debugging testRecoveryAndDoubleExecutionPreserveSplits now.


> Debugging flaky TestTruncateTableProcedure
> ------------------------------------------
>
>                 Key: HBASE-19457
>                 URL: https://issues.apache.org/jira/browse/HBASE-19457
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Appy
>            Assignee: Appy
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to