[ 
https://issues.apache.org/jira/browse/SOLR-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-10181.
-----------------------------------
    Resolution: Duplicate

Calling this a "duplicate" since it was fixed in SOLR-11444

> CREATEALIAS and DELETEALIAS commands consistency problems under concurrency
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-10181
>                 URL: https://issues.apache.org/jira/browse/SOLR-10181
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.3, 5.4, 5.5, 6.4.1
>            Reporter: Samuel García Martínez
>            Assignee: Erick Erickson
>            Priority: Major
>         Attachments: SOLR-10181_testcase.patch
>
>
> When several CREATEALIAS are run at the same time by the OCP it could happen 
> that, even tho the API response is OK, some of those CREATEALIAS request 
> changes are lost.
> h3. The problem
> The problem happens because the CREATEALIAS cmd implementation relies on 
> _zkStateReader.getAliases()_ to create the map that will be stored in ZK. If 
> several threads reach that line at the same time it will happen that only one 
> will be stored correctly and the others will be overridden.
> The code I'm referencing is [this 
> piece|https://github.com/apache/lucene-solr/blob/8c1e67e30e071ceed636083532d4598bf6a8791f/solr/core/src/java/org/apache/solr/cloud/CreateAliasCmd.java#L65].
>  As an example, let's say that the current aliases map has {a:colA, b:colB}. 
> If two CREATEALIAS (one adding c:colC and other creating d:colD) are 
> submitted to the _tpe_ and reach that line at the same time, the resulting 
> maps will look like {a:colA, b:colB, c:colC} and {a:colA, b:colB, d:colD} and 
> only one of them will be stored correctly in ZK, resulting in "data loss", 
> meaning that API is returning OK despite that it didn't work as expected.
> On top of this, another concurrency problem could happen when the command 
> checks if the alias has been set using _checkForAlias_ method. if these two 
> CREATEALIAS zk writes had ran at the same time, the alias check fir one of 
> the threads can timeout since only one of the writes has "survived" and has 
> been "committed" to the _zkStateReader.getAliases()_ map.
> h3. How to fix it
> I can post a patch to this if someone gives me directions on how it should be 
> fixed. As I see this, there are two places where the issue can be fixed: in 
> the processor (OverseerCollectionMessageHandler) in a generic way or inside 
> the command itself.
> h5. The processor fix
> The locking mechanism (_OverseerCollectionMessageHandler#lockTask_) should be 
> the place to fix this inside the processor. I thought that adding the 
> operation name instead of only "collection" or "name" to the locking key 
> would fix the issue, but I realized that the problem will happen anyway if 
> the concurrency happens between different operations modifying the same 
> resource (like CREATEALIAS and DELETEALIAS do). So, if this should be the 
> path to follow I don't know what should be used as a locking key.
> h5. The command fix
> Fixing it at the command level (_CreateAliasCmd_ and _DeleteAliasCmd_) would 
> be relatively easy. Using optimistic locking, i.e, using the aliases.json zk 
> version in the keeper.setData. To do that, Aliases class should offer the 
> aliases version so the commands can forward that version with the update and 
> retry when it fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to