Houston Putman created SOLR-16063:
-------------------------------------
Summary: Make the ZK ConnectionManager events actually
single-threaded
Key: SOLR-16063
URL: https://issues.apache.org/jira/browse/SOLR-16063
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrJ
Reporter: Houston Putman
The implementation of a single-threaded executor of ConnectionManger events in
the SolrZkClient is fundamentally broken.
This is because the ClientCnxn class in Zookeeper (or SolrZookeeper in our
case) also tries to do the same thing, use a single-threaded executor for these
events. I believe the original intent was for all watch events to be routed to
the {{zkConnManagerCallbackExecutor}} in SolrZkClient. However due to the way
that the {{ProcessWatchWithExecutor}} wrapper class sends these watch events
to the executor, we get the opposite functionality: The SolrZkClient executor
quickly "processes" these events, which merely get added to the ClientCnxn
thread event queue, which actually handles all of these watch events.
There are 2 major problem here.
# We cannot close the ClientCnxn eventThread, whenever ClientCnxn is closed by
SolrZookeeper (parent class of Zookeeper), it merely sends a *kill event* to
the eventThread queue, and wait for the queue to reach the end. I will detail
this later, but the event processing we do leads to continually looping event
threads, so the *kill event* is never reached.
# There is not 1 ClientCnxn eventThread, because there is not 1 SolrZookeeper.
Whenever we need to re-establish a connection, such as in the case of a
sessionExpired event, a new SolrZookeeper is created and the old one is closed.
Therefore we have multiple eventThreads being used if the old SolrZookeeper
cannot close its eventThread.
Given the above two points, we now have multiple eventThreads that we do not
control, and cannot close.
The solution is to fix the way that the ConnectionManager sends eventProcessing
requests to the SolrZkClient executor. The reason why
{{ProcessWatchWithExecutor}} doesn't work is that it isn't available in the
ConnectionManager, when the ConnectionManger is creating a new SolrZookeeper.
The SolrZookeeper requires a default watcher to send a connection events to,
and since the ConnectionManger can't pass itself wrapped in the SolrZkClient's
{{ProcessWatchWithExecutor}}, all of those requests are skipping the
SolrZkClient executor. If instead of wrapping the ConnectionManager in
{{ProcessWatchWithExecutor}}, we just pass an optional executor to
ConnectionManager, then it will always schedule events in the executor, even if
it isn't wrapped. This way the eventThread in ClientCnxn will be very quick to
process, since it will merely call {{ConnectionManager.process()}} which will
instantly schedule the processing to run in the SolrZkClient executor.
Therefore the ClientCnxn event thread will instantly close when requested,
since there will never be a backlog of processing events.
This solution also guarantees that every ConnectionManager event goes through
the SolrZkClient's single-threaded executor, meaning that we are truly
achieving our initial goal.
There are two tickets that cause this approach to fail:
* SOLR-4899: This makes the event processing thread wait until the zk client is
connected. This makes sense, because the ConnectionManager is waiting for a
Connected event to be processed, which can't be because the single-threaded
executor is stalling.
* SOLR-8599: This also pauses the single-thread executor and makes it so that
no other events can be processed.
I think this is the approach to take if we want to actually take this
single-threaded approach, but we need to make sure undoing the work in the two
issues mentioned above does not make Solr less stable.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]