[ 
https://issues.apache.org/jira/browse/SOLR-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316393#comment-16316393
 ] 

Andrzej Bialecki  commented on SOLR-11714:
------------------------------------------

The current behavior of the framework is trappy because user has to modify the 
preferences when he adds a {{searchRate}} trigger in order to avoid the loop - 
if he forgets to do that he can bring the autoscaling down.

There are two things that we can do here: {{ComputePlanAction}} should be able 
to detect infinite (or very long) loops based roughly on the cluster size and 
the total number of replicas across the cluster, eg. if we have a cluster of 10 
nodes and 20 replicas but the loop generated 1000 operations then something is 
definitely wrong.

Also, can we use some default limit, eg. 2 * replication factor, or something 
similar, for ADDREPLICA suggester, at least for events produced by 
{{searchRate}} trigger? Where do you think this default should be initialized?

> AddReplicaSuggester endless loop
> --------------------------------
>
>                 Key: SOLR-11714
>                 URL: https://issues.apache.org/jira/browse/SOLR-11714
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: AutoScaling
>    Affects Versions: 7.2, master (8.0)
>            Reporter: Andrzej Bialecki 
>            Assignee: Noble Paul
>         Attachments: 7.2-disable-search-rate-trigger.diff, SOLR-11714.diff
>
>
> {{SearchRateTrigger}} events are processed by {{ComputePlanAction}} and 
> depending on the condition either a MoveReplicaSuggester or 
> AddReplicaSuggester is selected.
> When {{AddReplicaSuggester}} is selected there's currently a bug in master, 
> due to an API change (Hint.COLL_SHARD should be used instead of Hint.COLL). 
> However, after fixing that bug {{ComputePlanAction}} goes into an endless 
> loop because the suggester endlessly keeps creating new operations.
> Please see the patch that fixes the Hint.COLL_SHARD issue and modifies the 
> unit test to illustrate this failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to