Currently when a replica is watching the current leader's ephemeral node
and the leader disappears, it runs the leadership check along with its two
way peer sync, ZK update etc. on the ZK event thread where the watch was
fired.

What this means is that for instances with lots of cores, you would be
serializing leadership elections and the last in the list could take a long
time to have a replacement elected (during which you will have no leader).

I did a quick change to make the checkIfIAmLeader call async, but Solr
cloud tests being what they are (thanks Shalin for cleaning them up btw :)
), I wanted to check if I am doing something stupid. If not, I will raise a
JIRA.

One contention could be if you might end up with two elections for the same
shard, but I can't see how that might happen..

Reply via email to