Andrzej Bialecki created SOLR-10745:
----------------------------------------
Summary: Reliably create nodeAdded / nodeLost events
Key: SOLR-10745
URL: https://issues.apache.org/jira/browse/SOLR-10745
Project: Solr
Issue Type: Sub-task
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrCloud
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki
Fix For: master (7.0)
When Overseer node goes down then depending on the current phase of trigger
execution a {{nodeLost}} event may not have been generated. Similarly, when a
new node is added and Overseer goes down before the trigger saves a checkpoint
(and before it produces {{nodeAdded}} event) this event may never be generated.
The proposed solution would be to modify how nodeLost / nodeAdded information
is recorded in the cluster:
* new nodes should do a ZK multi-write to both {{/live_nodes}} and additionally
to a predefined location eg. {{/autoscaling/nodeAdded/<nodeName>}}. On the
first execution of Trigger.run in the new Overseer leader it would check this
location for new znodes, which would indicate that node has been added, and
then generate a new event and remove the znode that corresponds to the event.
* node lost events should also be recorded to a predefined location eg.
{{/autoscaling/nodeLost/<nodeName>}}. Writing to this znode would be attempted
simultaneously by a few randomly selected nodes to make sure at least one of
them succeeds. On the first run of the new trigger instance (in new Overseer
leader) event generation would follow the sequence described above.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]