[
https://issues.apache.org/jira/browse/SOLR-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson resolved SOLR-10181.
-----------------------------------
Resolution: Duplicate
Calling this a "duplicate" since it was fixed in SOLR-11444
> CREATEALIAS and DELETEALIAS commands consistency problems under concurrency
> ---------------------------------------------------------------------------
>
> Key: SOLR-10181
> URL: https://issues.apache.org/jira/browse/SOLR-10181
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.3, 5.4, 5.5, 6.4.1
> Reporter: Samuel García Martínez
> Assignee: Erick Erickson
> Priority: Major
> Attachments: SOLR-10181_testcase.patch
>
>
> When several CREATEALIAS are run at the same time by the OCP it could happen
> that, even tho the API response is OK, some of those CREATEALIAS request
> changes are lost.
> h3. The problem
> The problem happens because the CREATEALIAS cmd implementation relies on
> _zkStateReader.getAliases()_ to create the map that will be stored in ZK. If
> several threads reach that line at the same time it will happen that only one
> will be stored correctly and the others will be overridden.
> The code I'm referencing is [this
> piece|https://github.com/apache/lucene-solr/blob/8c1e67e30e071ceed636083532d4598bf6a8791f/solr/core/src/java/org/apache/solr/cloud/CreateAliasCmd.java#L65].
> As an example, let's say that the current aliases map has {a:colA, b:colB}.
> If two CREATEALIAS (one adding c:colC and other creating d:colD) are
> submitted to the _tpe_ and reach that line at the same time, the resulting
> maps will look like {a:colA, b:colB, c:colC} and {a:colA, b:colB, d:colD} and
> only one of them will be stored correctly in ZK, resulting in "data loss",
> meaning that API is returning OK despite that it didn't work as expected.
> On top of this, another concurrency problem could happen when the command
> checks if the alias has been set using _checkForAlias_ method. if these two
> CREATEALIAS zk writes had ran at the same time, the alias check fir one of
> the threads can timeout since only one of the writes has "survived" and has
> been "committed" to the _zkStateReader.getAliases()_ map.
> h3. How to fix it
> I can post a patch to this if someone gives me directions on how it should be
> fixed. As I see this, there are two places where the issue can be fixed: in
> the processor (OverseerCollectionMessageHandler) in a generic way or inside
> the command itself.
> h5. The processor fix
> The locking mechanism (_OverseerCollectionMessageHandler#lockTask_) should be
> the place to fix this inside the processor. I thought that adding the
> operation name instead of only "collection" or "name" to the locking key
> would fix the issue, but I realized that the problem will happen anyway if
> the concurrency happens between different operations modifying the same
> resource (like CREATEALIAS and DELETEALIAS do). So, if this should be the
> path to follow I don't know what should be used as a locking key.
> h5. The command fix
> Fixing it at the command level (_CreateAliasCmd_ and _DeleteAliasCmd_) would
> be relatively easy. Using optimistic locking, i.e, using the aliases.json zk
> version in the keeper.setData. To do that, Aliases class should offer the
> aliases version so the commands can forward that version with the update and
> retry when it fails.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]