[
https://issues.apache.org/jira/browse/HBASE-23985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059142#comment-17059142
]
Nick Dimiduk commented on HBASE-23985:
--------------------------------------
Attaching log file from the run.
Perhaps noteworthy, {{testMasterZKSessionRecoveryFailure}} also fails during
this run -- all three attempts failed. It's possible that there is "cross-talk"
happening between the two tests. {{testMasterSessionExpired}} runs first,
perhaps leaves some garbage state, and then
{{testMasterZKSessionRecoveryFailure}} gets no where. I haven't looked into the
latter's logs yet.
> [flakey test] TestZooKeeper
> ---------------------------
>
> Key: HBASE-23985
> URL: https://issues.apache.org/jira/browse/HBASE-23985
> Project: HBase
> Issue Type: Test
> Components: test
> Affects Versions: 3.0.0
> Reporter: Nick Dimiduk
> Priority: Major
> Attachments: TEST-org.apache.hadoop.hbase.TestZooKeeper.xml
>
>
> I observed a test failure in {{TestZooKeeper#testMasterSessionExpired}} on my
> local rig. On a casual read of the logs from {{testMasterSessionExpired}}, it
> appears we have a faulty assumption related to master MTTR; the master abort
> is logged ~1250ms after ZK session close, which seems entirely too fast to
> me. Once the master aborts, the damage is done and the test cannot recover.
> The first re-run passes. Surefire does not keep logs of successful tests, so
> I don't know the timing between events in the successful run.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)