Pierre Salagnac created SOLR-17107:
--------------------------------------
Summary: Leader election is unpredictable if two threads join
concurrently election of the same replica
Key: SOLR-17107
URL: https://issues.apache.org/jira/browse/SOLR-17107
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrCloud
Affects Versions: 9.3, 8.11
Reporter: Pierre Salagnac
There is a race condition in leader election if two thread concurrently run the
election for the same replica. This is not about how leader election is
distributed across multiple Solr nodes, but how multiple threads in a single
Solr node conflict with each other.
On the overall, when two threads (on the same server) concurrently join leader
election for the same replica, the outcome is unpredictable. It may end in two
nodes thinking they are the leader or not having any leader at all.
h2. How to reproduce
I identified two scenarios, but maybe there are more:
*1. Zookeeper session expires while an election is already in progress.*
When we re-create the Zookeeper session, we re-register all the cores, and join
elections for all of them. If an election is already in-progress or is
triggered for any reason, we can have two threads on the same Solr server node
running leader election for the same core.
*2. Command REJOINLEADERELECTION is received twice concurrently for the same
core.*
This scenario is much easier to reproduce with an external client. It occurs
for us since we have customizations using this command.
h2. Full analysis
There are at least two issues in the current code.
*1. We blindly delete ZK nodes that were created by other threads*
Right after we created our ephemeral sequential ZK node to join the election
queue, we check whether there are other ZK nodes for the same session ID (so
the same Solr server). When some other nodes are found, we just deleted them
but we don't stop the election for any of the thread. It is likely the two
threads will think they won the election.
In addition, if two threads join the election concurrently, it is possible they
both delete the sequential node of the other thread. At the end, no node remain
in the queue. So if another node joins the election later, it will miss that
there may be already a leader.
The fix for this issue would be to have one of the two threads that aborts the
election, without deleting the node of the other thread.
The election process should be continued only by the thread with the smallest
sequence number in the queue.
*2. Mutability around {{LeaderElector}} and contexts*
Another issue is any thread can change the context of {{LeaderElector}}
instances. This can be done either by invoking {{setup()}} (mostly after ZK
session expiration) or {{{}retryElection(){}}}.
When we change the context, the old one is closed, by we don't take into
account what is the exact state of the election if another thread is currently
joining with the old context.
Not sure exactly what would be the fix for this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]