[
https://issues.apache.org/jira/browse/ZOOKEEPER-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573028#comment-14573028
]
Chris Nauroth commented on ZOOKEEPER-2204:
------------------------------------------
No worries. +1 (non-binding) for the latest patch. Thanks again for the
contribution.
> LearnerSnapshotThrottlerTest.testHighContentionWithTimeout fails occasionally
> -----------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2204
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2204
> Project: ZooKeeper
> Issue Type: Test
> Affects Versions: 3.5.0
> Reporter: Donny Nadolny
> Assignee: Donny Nadolny
> Priority: Minor
> Attachments: ZOOKEEPER-2204.patch, ZOOKEEPER-2204.patch
>
>
> The {{LearnerSnapshotThrottler}} will only allow 2 concurrent snapshots to be
> taken, and if there are already 2 snapshots in progress it will wait up to
> 200ms for one to complete. This isn't enough time for
> {{testHighContentionWithTimeout}} to consistently pass - on a cold JVM
> running just the one test I was able to get it to fail 3 times in around 50
> runs. This 200ms timeout will be hit if there is a delay between a thread
> calling {{LearnerSnapshot snap = throttler.beginSnapshot(false);}} and
> {{throttler.endSnapshot();}}.
> This also erroneously fails on the build server, see
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2747/testReport/org.apache.zookeeper.server.quorum/LearnerSnapshotThrottlerTest/testHighContentionWithTimeout/
> for an example.
> I have bumped the timeout up to 5 seconds (which should be more than enough
> for warmup / gc pauses), as well as added logging to the {{catch (Exception
> e)}} block to assist in debugging any future issues.
> An alternate approach would be to separate out results gathered from the
> threads, because although we only record true/false there are really three
> outcomes:
> 1. The {{snapshotNumber}} was <= 2, meaning the individual call operated
> correctly
> 2. The {{snapshotNumber}} was > 2, meaning the test should definitely fail
> 3. We were unable to snapshot in the time given, so we can't determine if we
> should fail or pass (although if we have "enough" successes from #1 with no
> failures from #2 maybe we would pass the test anyway).
> Bumping up the timeout is easier.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)