[
https://issues.apache.org/jira/browse/SOLR-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandre Rafalovitch closed SOLR-12480.
----------------------------------------
> TriggerAction failures may cause inconsistent trigger behavior
> --------------------------------------------------------------
>
> Key: SOLR-12480
> URL: https://issues.apache.org/jira/browse/SOLR-12480
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: AutoScaling
> Affects Versions: 7.4, master (8.0)
> Reporter: Andrzej Bialecki
> Priority: Major
>
> The following issue occasionally appears when running
> {{TestLargeCluster.testNodeLost}}.
> The test kills a large number of nodes, waiting for a certain time between
> the kills. Depending on the sequence and the length of {{waitFor}} it may
> happen that when {{ExecutePlanAction}} processes MOVEREPLICA the target node
> may just have been killed. This results in an exception and a FAILED status
> of the action.
> However, this failure is not reported back to the trigger as unprocessed
> event because it happens asynchronously in the action executor (in
> {{ScheduledTriggers}}) - so the trigger happily resets its internal state to
> no longer track the lost node. As a result, replicas remain lost and even if
> there’s a Policy violation the event will not be generated again, and the
> number of replicas won’t go back to the original number.
> Also, {{ScheduledTriggers:311}} and 323 only logs the exception but doesn’t
> fire listeners with FAILED status, which is a bug.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]