[ 
https://issues.apache.org/jira/browse/SOLR-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  resolved SOLR-11320.
--------------------------------------
       Resolution: Fixed
    Fix Version/s: master (8.0)
                   7.2

> Lock autoscaling triggers when changes they requested are being made
> --------------------------------------------------------------------
>
>                 Key: SOLR-11320
>                 URL: https://issues.apache.org/jira/browse/SOLR-11320
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: AutoScaling
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 7.2, master (8.0)
>
>
> Autoscaling triggers generate events that are then processed by actions such 
> as ComputePlanAction and ExecutePlanAction. This process is far from 
> instantaneous - it may take sometimes several seconds or even minutes to eg. 
> move or add replicas.
> The original condition that caused the first event will usually persist 
> during this time, and eventually after {{waitFor}} time elapsed it will lead 
> to a new event being generated, which will be queued for execution once the 
> previous actions are completed - but by that time the original condition may 
> have been alleviated by these actions, and the conditions reported in the new 
> event no longer reflect the latest cluster state.
> For this reason some autoscaling frameworks introduce a "cooldown" period, 
> where triggers are temporarily disabled for a fixed period of time to avoid 
> piling up new events while cluster changes are being made. This method 
> introduces a fixed delay that is specific to a trigger.
> From the point of view of control theory the feedback loop design should 
> minimize inherent delays because they are very hard to properly compensate 
> for and either lead to instability (when controller tries to compensate for 
> an out-of-step state) or lead to increased system lag (the system sluggishly 
> reacts to changes because it has to wait for things to settle down) - so from 
> this point of view a fixed delay, which is also hard to estimate properly and 
> may be inadequate depending on varying conditions, is not ideal.
> A better alternative would be to lock the trigger just for the actual 
> duration of time while changes are being made. Initially this could be 
> implemented as a global lock for all triggers for the duration of 
> modifications performed by ExecutePlanAction.
> Currently cluster modifications executed by ExecutePlanAction are made 
> asynchronously, so it's hard to determine when the changes actually take 
> effect, eg. when a new (or moved) replica becomes active, so this would have 
> to be changed as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to