[ 
https://issues.apache.org/jira/browse/SOLR-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130527#comment-17130527
 ] 

ASF subversion and git services commented on SOLR-14347:
--------------------------------------------------------

Commit d4f7c90bab3d3dd76c066aeb598544cf8250bac8 in lucene-solr's branch 
refs/heads/master from murblanc
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d4f7c90 ]

SOLR-14347: fix cached session update to not depend on Zookeeper state (#1542)

SOLR-14347: fix cached session update to not depend on Zookeeper state

> Autoscaling placement wrong when concurrent replica placements are calculated
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-14347
>                 URL: https://issues.apache.org/jira/browse/SOLR-14347
>             Project: Solr
>          Issue Type: Bug
>          Components: AutoScaling
>    Affects Versions: 8.5
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Critical
>             Fix For: 8.6
>
>         Attachments: SOLR-14347.patch
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  * create a cluster of a few nodes (tested with 7 nodes)
>  * define per-collection policies that distribute replicas exclusively on 
> different nodes per policy
>  * concurrently create a few collections, each using a different policy
>  * resulting replica placement will be seriously wrong, causing many policy 
> violations
> Running the same scenario but instead creating collections sequentially 
> results in no violations.
> I suspect this is caused by incorrect locking level for all collection 
> operations (as defined in {{CollectionParams.CollectionAction}}) that create 
> new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, 
> REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations 
> use the policy engine to create new replica placements, and as a result they 
> change the cluster state. However, currently these operations are locked (in 
> {{OverseerCollectionMessageHandler.lockTask}} ) using 
> {{LockLevel.COLLECTION}}. In practice this means that the lock is held only 
> for the particular collection that is being modified.
> A straightforward fix for this issue is to change the locking level to 
> CLUSTER (and I confirm this fixes the scenario described above). However, 
> this effectively serializes all collection operations listed above, which 
> will result in general slow-down of all collection operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to