[
https://issues.apache.org/jira/browse/SOLR-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129936#comment-17129936
]
Ilan Ginzburg commented on SOLR-14546:
--------------------------------------
Here are the changes done in
[https://github.com/apache/lucene-solr/pull/1561|https://github.com/apache/lucene-solr/pull/1561]:
Effectively use {{Sessions}} for iterating over queued tasks. In
{{OverseerCollectionMessageHandler.lockTask()}} the {{sessionId}} is now
assigned the new passed batch session id which was the main missing piece
({{Sessions}} were used once then thrown away). The bulk clearing of all locks
was removed, it triggered {{“Unlocked multiple times”}} and
{{“lock_is_leaked”}} logs from {{LockTree}}. Even though there were no running
tasks when locks were bulk cleared, there were race conditions because tasks
get removed from the running task set before their lock is “naturally”
released. If there are lock leaks we should fix them (but code looks pretty
solid).
Some refactoring/clean up done in {{LockTree}}. I’ve run into some locking
inconsistencies that were easier to fix that way.
Some clean-up/refactoring of {{LockLevel}} in {{CollectionParams}}. Enum order
reversed to allow reference to the child locking level and make the notion of
“height” ({{isHigherOrEqual()}}) more intuitive. Nothing fundamental.
In {{OverseerTaskProcessor}} added comments. Removed a few always true if
conditions and simplified task batch id management.
It is very likely that a fix was doable with less code changes, but as always
understanding then fixing didn’t go in a straight line and I believe
readability has improved (slightly) so submitting the PR that way.
> OverseerTaskProcessor can process messages out of order
> -------------------------------------------------------
>
> Key: SOLR-14546
> URL: https://issues.apache.org/jira/browse/SOLR-14546
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: master (9.0)
> Reporter: Ilan Ginzburg
> Priority: Major
> Labels: collection-api, correctness, overseer
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *Summary:*
> Collection API and ConfigSet API messages can be processed out of order by
> the {{OverseerTaskProcessor}}/{{OverseerCollectionConfigSetProcessor}}
> because of the way locking is handled.
>
> *Some context:*
> The Overseer task processor dequeues messages from the Collection and
> ConfigSet Zookeeper queue at {{/overseer/collection-queue-work}} and hands
> processing to an executor pool with 100 max parallel tasks
> ({{ClusterStateUpdater}} on the other hand uses a single thread consuming and
> processing the {{/overseer/queue}} and does not suffer from the problem
> described here).
> Locking prevents tasks modifying the same or related data from running
> concurrently. For the Collection API, locking can be done at {{CLUSTER}},
> {{COLLECTION}}, {{SHARD}} or {{REPLICA}} level and each command has its
> locking level defined in {{CollectionParams.CollectionAction}}.
> Multiple tasks for the same or competing locking levels do not execute
> concurrently. Commands locking at {{COLLECTION}} level for the same
> collection (for example {{CREATE}} creating a collection, {{CREATEALIAS}}
> creating an alias for the collection and {{CREATESHARD}} creating a new shard
> for the collection) will be executed sequentially, and it is expected that
> they are executed in submission order. Commands locking at different levels
> are also serialized when needed (for example changes to a shard of a
> collection and changes to the collection itself).
>
> *The issue:*
> The way {{OverseerTaskProcessor.run()}} deals with messages that cannot be
> executed due to competing messages already running (i.e. lock unavailable) is
> not correct. The method basically iterates over the set of messages to
> execute, skips those that can’t be executed due to locking and executes those
> that can be (assuming enough remaining available executors).
> The issue of out of order execution occurs when at least 3 messages compete
> for a lock. The following scenario shows out of order execution (messages
> numbered in enqueue order):
> * Message 1 gets the lock and runs,
> * Message 2 can’t be executed because the lock is taken,
> * Message 1 finishes execution and releases the lock,
> * Message 3 can be executed, gets the lock and runs,
> * Message 2 eventually runs when Message 3 finishes and releases the lock.
> We therefore have execution order 1 - 3 - 2 when the expected execution order
> is 1 - 2 - 3. Order 1 - 3 - 2 might not make sense or result in a different
> final state from 1 - 2 - 3.
> (there’s a variant of this scenario in which after Message 2 can't be
> executed the number of max parallel tasks is reached, then remaining tasks in
> {{heads}} including Message 3 are put into the {{blockedTasks}} map. These
> tasks are the first ones considered for processing on the next iteration of
> the main {{while}} loop. The reordering is similar).
>
> Note that from the client perspective, there does not need to be 3
> outstanding tasks, 2 are sufficient. The third task required for observing
> reordering could be enqueued after the first one completed.
>
> *Impact of the issue:*
> This problem makes {{MultiThreadedOCPTest.testFillWorkQueue()}} fail because
> the test requires task execution in submission order (SOLR-14524).
>
> The messages required for reordering to happen are not necessarily messages
> at the same locking level: a message locking at {{SHARD}} or {{REPLICA}}
> level prevents execution of a {{COLLECTION}} level message for the same
> collection. Examples can be built of sequences of messages that lead to
> incorrect results or failures. For example {{ADDREPLICA}} followed by
> {{CREATEALIAS}} followed by {{DELETEALIAS}}, could see alias creation and
> deletion reordered, making no sense.
> I wasn’t able to come up with a realistic production example that would be
> impacted by this issue (doesn't mean one doesn't exist!).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]