[
https://issues.apache.org/jira/browse/SOLR-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856041#comment-17856041
]
ASF subversion and git services commented on SOLR-17331:
--------------------------------------------------------
Commit 04acaca3e186e3a1e3f260bf3c3ac8ed32b1ff28 in solr's branch
refs/heads/branch_9x from Houston Putman
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=04acaca3e18 ]
SOLR-17331: More optimal placements with OrderedNodePlacementPlugin (#2515)
- Move tests, adding tests for the simple plugin
(cherry picked from commit fc0d84afaa8b49bd0515f796abd901e5150d5982)
> MigrateReplicasTest.testGoodSpreadDuringAssignWithNoTarget is flaky
> -------------------------------------------------------------------
>
> Key: SOLR-17331
> URL: https://issues.apache.org/jira/browse/SOLR-17331
> Project: Solr
> Issue Type: Test
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: Yohann Callea
> Assignee: Houston Putman
> Priority: Minor
> Time Spent: 40m
> Remaining Estimate: 0h
>
> The test *_MigrateReplicasTest.testGoodSpreadDuringAssignWithNoTarget_* is
> sometimes (< 3% failure rate) failing on its last assertion, as shows the
> [trend history of test
> failures|#series/org.apache.solr.cloud.MigrateReplicasTest.testGoodSpreadDuringAssignWithNoTarget].
>
> This test spins off a 5 nodes cluster, creates a collection with 3 shards and
> a replication factor of 2.
> It then vacate 2 randomly chosen nodes using the Migrate Replicas command
> and, after the migration completion, expect the vacated node to be assigned
> no replicas and the 6 replicas to be evenly spread across the 3 non-vacated
> nodes (i.e., 2 replicas positioned on each node).
> However, this last assertion happen to fail as the replicas are sometimes not
> evenly spread over the 3 non-vacated nodes.
> {code:java}
> The non-source node '127.0.0.1:36007_solr' has the wrong number of replicas
> after the migration expected:<2> but was:<1> {code}
>
> If we analyse more in detail a failure situation, it appears that this test
> is inherently expected to fail under some circumstances, given how the
> Migrate Replicas command operate.
> When migrating replicas, the new position of the replicas to be moved are
> calculated sequentially and, for every consecutive move, the position is
> decided according to the logic implemented by the replica placement plugin
> currently configured.
> We can therefore end up in the following situation.
> h2. Failing scenario
> Note that this test always uses the default replica placement strategy, which
> is Simple as of today.
> Let's assume the following initial state, after the collection creation.
> {code:java}
> | NODE_0 | NODE_1 | NODE_2 | NODE_3 | NODE_4 |
> --------+---------+---------+---------+---------+---------+
> SHARD_1 | X | | | X | |
> SHARD_2 | | X | | X | |
> SHARD_3 | | | X | | X | {code}
> The test now runs the migrate command to vacate *_NODE_3_* and
> {*}_NODE_4_{*}. It therefore needs to go through 3 replica movements for
> emptying these two nodes.
> h4. Move 1
> We are moving the replica of *_SHARD_1_* positioned on {*}_NODE_3_{*}.
> _*NODE_0*_ is not an eligible destination for this replica as this node is
> already assigned a replica of {*}_SHARD_1_{*}, and both *_NODE_1_* and
> _*NODE_2*_ can be chosen as they host the same number of replicas.
> *_NODE_1_* is arbitrarily chosen amongst the two best candidate destination
> nodes.
> {code:java}
> | NODE_0 | NODE_1 | NODE_2 | NODE_3 | NODE_4 |
> --------+---------+---------+---------+---------+---------+
> SHARD_1 | X | X | | | |
> SHARD_2 | | X | | X | |
> SHARD_3 | | | X | | X | {code}
> h4. Move 2
> We are moving the replica of *_SHARD_2_* positioned on {*}_NODE_3_{*}.
> _*NODE_1*_ is not an eligible destination for this replica as this node is
> already assigned a replica of {*}_SHARD_2_{*}, and both *_NODE_0_* and
> _*NODE_2*_ can be chosen as they host the same number of replicas.
> *_NODE_0_* is arbitrarily chosen amongst the two best candidate destination
> nodes.
> {code:java}
> | NODE_0 | NODE_1 | NODE_2 | NODE_3 | NODE_4 |
> --------+---------+---------+---------+---------+---------+
> SHARD_1 | X | X | | | |
> SHARD_2 | X | X | | | |
> SHARD_3 | | | X | | X |{code}
> h4. Move 3
> We are moving the replica of *_SHARD_3_* positioned on {*}_NODE_4_{*}.
> _*NODE_2*_ is not an eligible destination for this replica as this node is
> already assigned a replica of {*}_SHARD_3_{*}, and both *_NODE_0_* and
> _*NODE_1*_ can be chosen as they host the same number of replicas.
> *_NODE_1_* is arbitrarily chosen amongst the two best candidate destination
> nodes.
> {code:java}
> | NODE_0 | NODE_1 | NODE_2 | NODE_3 | NODE_4 |
> --------+---------+---------+---------+---------+---------+
> SHARD_1 | X | X | | | |
> SHARD_2 | X | X | | | |
> SHARD_3 | | X | X | | |{code}
>
> The test will then fail as the replicas are not evenly positioned across the
> non-vacated nodes, while it is arguably the expected outcome in the current
> situation given the Simple placement strategy implementation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]