[
https://issues.apache.org/jira/browse/SOLR-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966805#comment-16966805
]
Yonik Seeley edited comment on SOLR-13884 at 11/4/19 4:56 PM:
--------------------------------------------------------------
OK, I updated the test to reproduce another serious bug with replica placement
and concurrent collection creation.
When collection-level policies are used, and the cluster is currently
unbalanced, it's relatively easy to get into a situation where multiple
replicas are assigned to the exact same node. In the wild, I've actually seen
all 5 replicas of a single shard be assigned to the same node, and I've been
able to reproduce that with my test case.
The test case is currently set up to reproduce the simplest case I could
manage. We start off with just 2 nodes, create a single replica on one node,
then do 2 collection create commands concurrently (each with 1 shard and
replicationFactor=2). Pretty much 100% of the time, 1 shard will end up with
both replicas on the same node. This does not happen if the creations are done
serially. It also doesn't happen if there is an identical cluster-level policy
specified.
I've updated the title / description of this issue to match the described
problem and opened a new issue (SOLR-13891) for the default replica placement
problem.
was (Author: [email protected]):
OK, I updated the test to reproduce another serious bug with replica placement
and concurrent collection creation.
When collection-level policies are used, and the cluster is currently
unbalanced, it's relatively easy to get into a situation where multiple
replicas are assigned to the exact same node. In the wild, I've actually seen
all 5 replicas of a single shard be assigned to the same node, and I've been
able to reproduce that with my test case.
The test case is currently set up to reproduce the simplest case I could
manage. We start off with just 2 nodes, create a single replica on one node,
then do 2 collection create commands concurrently (each with 1 shard and
replicationFactor=2). Pretty much 100% of the time, 1 shard will end up with
both replicas on the same node. This does not happen if the creations are done
serially. It also doesn't happen if there is an identical cluster-level policy
specified.
> Concurrent collection creation leads to multiple replicas placed on same node
> -----------------------------------------------------------------------------
>
> Key: SOLR-13884
> URL: https://issues.apache.org/jira/browse/SOLR-13884
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Yonik Seeley
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> When multiple collection creations are done concurrently with a
> collection-level policy, multiple replicas of a single shard can end up on
> the same node, violating the specified policy.
> This was observed on both 8.2 and master.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]