[
https://issues.apache.org/jira/browse/SOLR-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki resolved SOLR-11320.
--------------------------------------
Resolution: Fixed
Fix Version/s: master (8.0)
7.2
> Lock autoscaling triggers when changes they requested are being made
> --------------------------------------------------------------------
>
> Key: SOLR-11320
> URL: https://issues.apache.org/jira/browse/SOLR-11320
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: AutoScaling
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Fix For: 7.2, master (8.0)
>
>
> Autoscaling triggers generate events that are then processed by actions such
> as ComputePlanAction and ExecutePlanAction. This process is far from
> instantaneous - it may take sometimes several seconds or even minutes to eg.
> move or add replicas.
> The original condition that caused the first event will usually persist
> during this time, and eventually after {{waitFor}} time elapsed it will lead
> to a new event being generated, which will be queued for execution once the
> previous actions are completed - but by that time the original condition may
> have been alleviated by these actions, and the conditions reported in the new
> event no longer reflect the latest cluster state.
> For this reason some autoscaling frameworks introduce a "cooldown" period,
> where triggers are temporarily disabled for a fixed period of time to avoid
> piling up new events while cluster changes are being made. This method
> introduces a fixed delay that is specific to a trigger.
> From the point of view of control theory the feedback loop design should
> minimize inherent delays because they are very hard to properly compensate
> for and either lead to instability (when controller tries to compensate for
> an out-of-step state) or lead to increased system lag (the system sluggishly
> reacts to changes because it has to wait for things to settle down) - so from
> this point of view a fixed delay, which is also hard to estimate properly and
> may be inadequate depending on varying conditions, is not ideal.
> A better alternative would be to lock the trigger just for the actual
> duration of time while changes are being made. Initially this could be
> implemented as a global lock for all triggers for the duration of
> modifications performed by ExecutePlanAction.
> Currently cluster modifications executed by ExecutePlanAction are made
> asynchronously, so it's hard to determine when the changes actually take
> effect, eg. when a new (or moved) replica becomes active, so this would have
> to be changed as well.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]