Houston Putman created SOLR-16063:
-------------------------------------

             Summary: Make the ZK ConnectionManager events actually 
single-threaded
                 Key: SOLR-16063
                 URL: https://issues.apache.org/jira/browse/SOLR-16063
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrJ
            Reporter: Houston Putman


The implementation of a single-threaded executor of ConnectionManger events in 
the SolrZkClient is fundamentally broken.

This is because the ClientCnxn class in Zookeeper (or SolrZookeeper in our 
case) also tries to do the same thing, use a single-threaded executor for these 
events. I believe the original intent was for all watch events to be routed to 
the {{zkConnManagerCallbackExecutor}} in SolrZkClient. However due to the way 
that the {{ProcessWatchWithExecutor}}  wrapper class sends these watch events 
to the executor, we get the opposite functionality: The SolrZkClient executor 
quickly "processes" these events, which merely get added to the ClientCnxn 
thread event queue, which actually handles all of these watch events.

There are 2 major problem here.
# We cannot close the ClientCnxn eventThread, whenever ClientCnxn is closed by 
SolrZookeeper (parent class of Zookeeper), it merely sends a *kill event* to 
the eventThread queue, and wait for the queue to reach the end. I will detail 
this later, but the event processing we do leads to continually looping event 
threads, so the *kill event* is never reached.
# There is not 1 ClientCnxn eventThread, because there is not 1 SolrZookeeper. 
Whenever we need to re-establish a connection, such as in the case of a 
sessionExpired event, a new SolrZookeeper is created and the old one is closed. 
Therefore we have multiple eventThreads being used if the old SolrZookeeper 
cannot close its eventThread.

Given the above two points, we now have multiple eventThreads that we do not 
control, and cannot close.

The solution is to fix the way that the ConnectionManager sends eventProcessing 
requests to the SolrZkClient executor. The reason why 
{{ProcessWatchWithExecutor}} doesn't work is that it isn't available in the 
ConnectionManager, when the ConnectionManger is creating a new SolrZookeeper. 
The SolrZookeeper requires a default watcher to send a connection events to, 
and since the ConnectionManger can't pass itself wrapped in the SolrZkClient's 
{{ProcessWatchWithExecutor}}, all of those requests are skipping the 
SolrZkClient executor. If instead of wrapping the ConnectionManager in 
{{ProcessWatchWithExecutor}}, we just pass an optional executor to 
ConnectionManager, then it will always schedule events in the executor, even if 
it isn't wrapped. This way the eventThread in ClientCnxn will be very quick to 
process, since it will merely call {{ConnectionManager.process()}} which will 
instantly schedule the processing to run in the SolrZkClient executor. 
Therefore the ClientCnxn event thread will instantly close when requested, 
since there will never be a backlog of processing events.

This solution also guarantees that every ConnectionManager event goes through 
the SolrZkClient's single-threaded executor, meaning that we are truly 
achieving our initial goal.

There are two tickets that cause this approach to fail:
* SOLR-4899: This makes the event processing thread wait until the zk client is 
connected. This makes sense, because the ConnectionManager is waiting for a 
Connected event to be processed, which can't be because the single-threaded 
executor is stalling.
* SOLR-8599: This also pauses the single-thread executor and makes it so that 
no other events can be processed.

I think this is the approach to take if we want to actually take this 
single-threaded approach, but we need to make sure undoing the work in the two 
issues mentioned above does not make Solr less stable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to