[
https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283058#comment-16283058
]
Appy commented on HBASE-19457:
------------------------------
Findings from today:
- Checked ~10 runs, these two are failing in all.
org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure.testRecoveryAndDoubleExecutionPreserveSplits
org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure.testSimpleTruncatePreserveSplits
- Tried debugging testSimpleTruncatePreserveSplits (since it's simpler test),
dug around, added a bunch of logging messages (will commit shortly), in the
end, it's not even the culprit. It finished perfectly, but then times out
cleaning up tables in @After TestTableDDLProcedureBase#teadDown.
- More importantly, table testRecoveryAndDoubleExecutionPreserveSplits is the
issue.
{noformat}
2017-12-06 22:42:44,216 DEBUG [ProcExecWrkr-1]
procedure.TruncateTableProcedure(134): truncate
'testSimpleTruncatePreserveSplits' completed
2017-12-06 22:42:44,260 INFO [ProcExecWrkr-1]
procedure2.ProcedureExecutor(1246): Finished pid=99, state=SUCCESS;
TruncateTableProcedure (table=testSimpleTruncatePreserveSplits
preserveSplits=true) in 2.0060sec
2017-12-06 22:42:44,508 INFO [Time-limited test] hbase.Waiter(189): Waiting up
to [60,000] milli-secs(wait.for.ratio=[1])
2017-12-06 22:42:44,515 INFO [Time-limited test] hbase.Waiter(189): Waiting up
to [60,000] milli-secs(wait.for.ratio=[1])
2017-12-06 22:42:44,520 DEBUG [Time-limited test] util.CommonFSUtils(736):
Current file system:
2017-12-06 22:42:44,521 DEBUG [Time-limited test] util.CommonFSUtils(754):
|-.tabledesc/
2017-12-06 22:42:44,522 DEBUG [Time-limited test] util.CommonFSUtils(757):
|----.tableinfo.0000000001
2017-12-06 22:42:44,522 DEBUG [Time-limited test] util.CommonFSUtils(754):
|-.tmp/
2017-12-06 22:42:44,523 DEBUG [Time-limited test] util.CommonFSUtils(754):
|-1c6cd8294a789f4998b71741869e1aa7/
2017-12-06 22:42:44,524 DEBUG [Time-limited test] util.CommonFSUtils(757):
|----.regioninfo
2017-12-06 22:42:44,524 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----f1/
2017-12-06 22:42:44,524 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----f2/
2017-12-06 22:42:44,525 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----recovered.edits/
2017-12-06 22:42:44,526 DEBUG [Time-limited test] util.CommonFSUtils(757):
|-------2.seqid
2017-12-06 22:42:44,526 DEBUG [Time-limited test] util.CommonFSUtils(754):
|-5bd5770f4cb5fda71c84441d6be2c0e7/
2017-12-06 22:42:44,527 DEBUG [Time-limited test] util.CommonFSUtils(757):
|----.regioninfo
2017-12-06 22:42:44,527 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----f1/
2017-12-06 22:42:44,527 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----f2/
2017-12-06 22:42:44,528 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----recovered.edits/
2017-12-06 22:42:44,529 DEBUG [Time-limited test] util.CommonFSUtils(757):
|-------2.seqid
2017-12-06 22:42:44,529 DEBUG [Time-limited test] util.CommonFSUtils(754):
|-c24958fa636e29e21028e350d32623fb/
2017-12-06 22:42:44,530 DEBUG [Time-limited test] util.CommonFSUtils(757):
|----.regioninfo
2017-12-06 22:42:44,530 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----f1/
2017-12-06 22:42:44,530 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----f2/
2017-12-06 22:42:44,531 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----recovered.edits/
2017-12-06 22:42:44,532 DEBUG [Time-limited test] util.CommonFSUtils(757):
|-------2.seqid
2017-12-06 22:42:44,532 DEBUG [Time-limited test] util.CommonFSUtils(754):
|-e9f2e3d66bf51aa5d73745df63badf9a/
2017-12-06 22:42:44,533 DEBUG [Time-limited test] util.CommonFSUtils(757):
|----.regioninfo
2017-12-06 22:42:44,533 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----f1/
2017-12-06 22:42:44,533 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----f2/
2017-12-06 22:42:44,534 DEBUG [Time-limited test] util.CommonFSUtils(754):
|----recovered.edits/
2017-12-06 22:42:44,535 DEBUG [Time-limited test] util.CommonFSUtils(757):
|-------2.seqid
2017-12-06 22:42:44,556 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=43202] ipc.CallRunner(141):
callId: 233 service: ClientService methodName: Scan size: 138 connection:
67.195.81.155:48320 deadline: 1512600224556
.....some scanner errors for NotServingRegionException, but then retry
passes....
2017-12-06 22:42:44,998 DEBUG [Time-limited test] client.ClientScanner(241):
Advancing internal scanner to startKey at 'a', inclusive
2017-12-06 22:42:45,000 DEBUG [Time-limited test] client.ClientScanner(241):
Advancing internal scanner to startKey at 'b', inclusive
2017-12-06 22:42:45,002 DEBUG [Time-limited test] client.ClientScanner(241):
Advancing internal scanner to startKey at 'c', inclusive
2017-12-06 22:42:45,005 WARN [Time-limited test]
procedure2.ProcedureTestingUtility(146): Set Kill before store update to: false
2017-12-06 22:42:45,012 INFO [Time-limited test]
procedure.TestTableDDLProcedureBase(67): Tear down, remove
table=testRecoveryAndDoubleExecutionPreserveSplits
2017-12-06 22:42:45,013 INFO [Time-limited test] client.HBaseAdmin$15(895):
Started disable of testRecoveryAndDoubleExecutionPreserveSplits
2017-12-06 22:42:45,014 INFO
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318]
master.HMaster$8(2284): Client=jenkins//67.195.81.155 disable
testRecoveryAndDoubleExecutionPreserveSplits
2017-12-06 22:42:45,103 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318]
procedure2.ProcedureExecutor(868): Stored pid=104,
state=RUNNABLE:DISABLE_TABLE_PREPARE; DisableTableProcedure
table=testRecoveryAndDoubleExecutionPreserveSplits
2017-12-06 22:42:45,105 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318]
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:45,207 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318]
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:45,409 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318]
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:45,711 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318]
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:46,202 INFO [ReadOnlyZKClient]
zookeeper.ReadOnlyZKClient(306): 0x5adb9ac4 no activities for 60000 ms, close
active connection. Will reconnect next time when there are new requests.
2017-12-06 22:42:46,213 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318]
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
2017-12-06 22:42:47,215 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=39318]
master.MasterRpcServices(1124): Checking to see if procedure is done pid=104
{noformat}
Onto debugging testRecoveryAndDoubleExecutionPreserveSplits now.
> Debugging flaky TestTruncateTableProcedure
> ------------------------------------------
>
> Key: HBASE-19457
> URL: https://issues.apache.org/jira/browse/HBASE-19457
> Project: HBase
> Issue Type: Bug
> Reporter: Appy
> Assignee: Appy
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)