[
https://issues.apache.org/jira/browse/KUDU-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273819#comment-17273819
]
ASF subversion and git services commented on KUDU-3237:
-------------------------------------------------------
Commit cbb8db7711785e85e7c74fe86b93d7c1ba9c9c24 in kudu's branch
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=cbb8db7 ]
KUDU-3237 fix MaintenanceManagerTest.TestCompletedOpsHistory
This patch fixes a flakiness in the TestCompletedOpsHistory scenario.
The flakiness is a test-only issue which became apparent as a result of
the recent changes introduced into the MaintenanceManager with 9e4664d44
changelist. In essence, with finer granularity of locking in the
scoped cleanup of the MaintenanceManager::LaunchOp() method, the test
thread calling MaintenanceManager::GetMaintenanceManagerStatusDump()
has a slight chance of acquiring 'completed_ops_lock_' ahead of the
thread executing the code in the LaunchOp()'s scoped cleanup.
This patch wraps the related code into ASSERT_EVENTUALLY to resolve
test-only race condition mentioned above. I verified that this patch
fixes the issue by running the test scenario multiple times under
dist-test (RELEASE build).
Before: 2 out of 256 runs failed
http://dist-test.cloudera.org//job?job_id=aserbin.1611806979.74192
After : 0 out of 256 runs failed
http://dist-test.cloudera.org//job?job_id=aserbin.1611809676.95320
Change-Id: I760287b3ed4d50e32d2f9257e5390fdf8fa8f288
Reviewed-on: http://gerrit.cloudera.org:8080/16991
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Alexey Serbin <[email protected]>
> MaintenanceManagerTest.TestCompletedOpsHistory is flaky
> -------------------------------------------------------
>
> Key: KUDU-3237
> URL: https://issues.apache.org/jira/browse/KUDU-3237
> Project: Kudu
> Issue Type: Test
> Reporter: Hao Hao
> Priority: Major
>
> Came across test failure in MaintenanceManagerTest.TestCompletedOpsHistory as
> the following:
> {noformat}
> I0125 19:55:10.782884 24454 maintenance_manager.cc:594] P 12345: op5
> complete. Timing: real 0.000s user 0.000s sys 0.000s Metrics: {}
> /data/1/hao/Repositories/kudu/src/kudu/util/maintenance_manager-test.cc:525:
> Failure
> Expected: std::min(kHistorySize, i + 1)
> Which is: 6
> To be equal to: status_pb.completed_operations_size()
> Which is: 5
> I0125 19:55:10.783524 24420 test_util.cc:148]
> -----------------------------------------------
> I0125 19:55:10.783561 24420 test_util.cc:149] Had fatal failures, leaving
> test files at
> /tmp/dist-test-task1ofSWE/test-tmp/maintenance_manager-test.0.MaintenanceManagerTest.TestCompletedOpsHistory.1611604508702756-24420
> {noformat}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)