[
https://issues.apache.org/jira/browse/KUDU-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025491#comment-17025491
]
ASF subversion and git services commented on KUDU-3046:
-------------------------------------------------------
Commit 6e4dd49a4716f1aed4f533a85b794dfd04f3ab96 in kudu's branch
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=6e4dd49 ]
KUDU-3046: deflake TabletServerQuiescingITest
The test was flaky for a number of reasons including:
- Slowness in TSAN mode along with a low Raft timeout meant workloads
would fail to even create tablets.
- Addressed this by increasing the heartbeat interval in TSAN mode.
- Not hitting the exact number of scanners when running the tool because
of a TOCTOU race between checking the number of scanners and running
the tool.
- Addressed this by reducing the number of read threads and thus
reducing the degrees of freedom with which the tool can run (either
0 scanners or 1 scanner).
- TestAbruptStepdownWhileAllQuiescing failed because the test would step
down a leader without the guarantee that it was the latest leader, so
a leader could still exist even after stepping down.
- Addressed this by stepping down on all tablet servers just to be
sure, and retrying if necessary via ASSERT_EVENTUALLY.
There appears to be another source of flakiness that are less specific
to this test, but this dropped flakiness from failing 4/100 to failing
9/2000 (all due to a TSAN issue in the TestWorkload that I'm still
getting to the bottom of).
Change-Id: I3f9ef531062c4b66648840e04962070768fbad5d
Reviewed-on: http://gerrit.cloudera.org:8080/15113
Reviewed-by: Adar Dembo <[email protected]>
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
> TestQuiescingServerDoesntTriggerElections sometimes fails
> ---------------------------------------------------------
>
> Key: KUDU-3046
> URL: https://issues.apache.org/jira/browse/KUDU-3046
> Project: Kudu
> Issue Type: Bug
> Components: test
> Affects Versions: 1.12.0
> Reporter: Alexey Serbin
> Assignee: Andrew Wong
> Priority: Minor
> Attachments: tablet_server_quiescing-itest.txt.xz
>
>
> The {{TServerQuiescingITest.TestQuiescingServerDoesntTriggerElections}} test
> scenario sometimes fails (TSAN builds) in {{CreateWorkloadTable()}} with the
> messages like below:
> {noformat}
> F0127 03:48:05.405407 458 test_workload.cc:330] Timed out: Timed out
> waiting for Table Creation
> {noformat}
> It seems the custom setting for the {{\-\-raft_heartbeat_interval_ms}} flag
> might be too low in case of TSAN builds.
> The full log is attached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)