[
https://issues.apache.org/jira/browse/SOLR-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725884#comment-16725884
]
Jason Gerlowski commented on SOLR-13045:
----------------------------------------
One of the remaining failures for in TestSimPolicyCloud occurs in
{{testCreateCollectionAddShardUsingPolicy}} when the initial collection
creation (and subsequent shard creation) seem to violate a policy which
specifies that all replicas should be created on the same node. After looking
closer, it looks like this comes down to a race condition of sorts between two
threads attempting to set the autoscaling.json "ZK" node.
Two different threads touch the autoscaling config node in this test: the
OverseerTriggerThread tries to set the default nodeAdded trigger, and the test
code tries to set a policy that the test relies on. These threads rely on
optimistic concurrency versioning to ensure that updates don't clobber one
another. But SimDistribStateManager has a bug which prevents this from working
correctly all the time. The initial node version in the sim framework is -1,
which is also the flag used to indicate "I don't care about concurrency, just
overwrite the node". (For comparison, ZkDistribStateManager has node versions
start at 0). Depending on timing, this causes the default nodeAdded trigger to
clobber the policy that our test relies on, causing it to fail.
So one fix that'll make this test (and probably others in the sim framework)
more reliable is to ensure that SimDistribStateManager's node-versioning lines
up better with ZkDistribStateManager's. Or at least that it avoids this -1
edge case. I've been testing variations of a patch to accomplish this, and
will upload my results shortly.
> Harden TestSimPolicyCloud
> -------------------------
>
> Key: SOLR-13045
> URL: https://issues.apache.org/jira/browse/SOLR-13045
> Project: Solr
> Issue Type: Test
> Security Level: Public(Default Security Level. Issues are Public)
> Components: AutoScaling
> Affects Versions: master (8.0)
> Reporter: Jason Gerlowski
> Assignee: Jason Gerlowski
> Priority: Major
> Attachments: SOLR-13045.patch, SOLR-13045.patch, jenkins.log.txt.gz
>
>
> Several tests in TestSimPolicyCloud, but especially
> {{testCreateCollectionAddReplica}}, have some flaky behavior, even after
> Mark's recent test-fix commit. This JIRA covers looking into and (hopefully)
> fixing this test failure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]