Currently when a replica is watching the current leader's ephemeral node and the leader disappears, it runs the leadership check along with its two way peer sync, ZK update etc. on the ZK event thread where the watch was fired.
What this means is that for instances with lots of cores, you would be serializing leadership elections and the last in the list could take a long time to have a replacement elected (during which you will have no leader). I did a quick change to make the checkIfIAmLeader call async, but Solr cloud tests being what they are (thanks Shalin for cleaning them up btw :) ), I wanted to check if I am doing something stupid. If not, I will raise a JIRA. One contention could be if you might end up with two elections for the same shard, but I can't see how that might happen..
