I think I understand your scenario, at least I hope I made some progress...

The challenge is to coordinate river execution on independent nodes,
especially during a critical cluster restart phase. Because river instances
are autonomous by design, the result is unbalanced - the ES RiversRouter
does not care how many river instances are "allowed" on a node, so it
happily starts 10 rivers on a node that were previously on 5 nodes.

To get the affinity problem addressed, I hope I can make progress on a
"gatherer" framework that is able to distribute jobs among many nodes:
https://github.com/jprante/elasticsearch-gatherer

Each gatherer executes jobs by a thread pool. To avoid overloading jobs on
a single gatherer node (e.g. in a critical cluster restart phase) I plan to
let a reschedule thread check the cluster state for available gatherer
nodes and their capacities (which can be the job queue length, or the
system load). Jobs, once started, are owned by gatherer nodes, and added to
a waiting queue. The queue can be examined, and jobs could also be
reassigned to another gatherer with more available capacity. That is the
basic idea.

Sorry, I know the gatherer project does not help quickly to improve the
current river installations. Just my 2c to let you know about one of my
spare time activities ;)

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE%2BA8pNV3EdYdPLtgW%3DjU0p7_2gVFaUpYZFXhdVB%2BBkqg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to