[jira] [Commented] (SOLR-4744) Version conflict error during shard split test

Yonik Seeley (JIRA) Tue, 28 May 2013 07:52:33 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668344#comment-13668344
 ]


Yonik Seeley commented on SOLR-4744:
------------------------------------

bq. What happens with partial updates in that case? Suppose an increment 
operation is requested which succeeds locally but is not propagated to the sub 
shard.

If we're talking about failures due to the sub-shard already being active when 
it receives an update from the old shard who thinks it's still the leader, then 
I think we're fine.  This isn't a new failure mode, but just another way that 
the old shard can be out of date.  For example, once a normal update is 
received by the new shard, the old shard will be out of date anyway.

bq. If the client retries, the index will have wrong values.

If the client retries to the same old shard that is no longer the leader, then 
the update will fail again because the sub-shard will reject it again?  We 
could perhaps return an error code suggesting that the client is using stale 
cluster state (i.e. re-read before trying the update again).

                
> Version conflict error during shard split test
> ----------------------------------------------
>
>                 Key: SOLR-4744
>                 URL: https://issues.apache.org/jira/browse/SOLR-4744
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 4.4
>
>         Attachments: SOLR-4744.patch
>
>
> ShardSplitTest fails sometimes with the following error:
> {code}
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> invoked for collection: collection1
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1 
> to inactive
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> shard1_0 to active
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> shard1_1 to active
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.873; 
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= 
> path=/update params={wt=javabin&version=2} {add=[169 (1432319507166134272)]} 
> 0 2
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.884; 
> org.apache.solr.update.processor.LogUpdateProcessor; 
> [collection1_shard1_1_replica1] webapp= path=/update 
> params={distrib.from=http://127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2}
>  {} 0 1
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.885; 
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= 
> path=/update 
> params={distrib.from=http://127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2}
>  {add=[169 (1432319507173474304)]} 0 2
> [junit4:junit4]   1> ERROR - 2013-04-14 19:05:26.885; 
> org.apache.solr.common.SolrException; shard update error StdNode: 
> http://127.0.0.1:41028/collection1_shard1_1_replica1/:org.apache.solr.common.SolrException:
>  version conflict for 169 expected=1432319507173474304 actual=-1
> [junit4:junit4]   1>  at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:404)
> [junit4:junit4]   1>  at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> [junit4:junit4]   1>  at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
> [junit4:junit4]   1>  at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
> [junit4:junit4]   1>  at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> [junit4:junit4]   1>  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> [junit4:junit4]   1>  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> [junit4:junit4]   1>  at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> [junit4:junit4]   1>  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> [junit4:junit4]   1>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> [junit4:junit4]   1>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [junit4:junit4]   1>  at java.lang.Thread.run(Thread.java:679)
> [junit4:junit4]   1> 
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.886; 
> org.apache.solr.update.processor.DistributedUpdateProcessor; try and ask 
> http://127.0.0.1:41028 to recover
> {code}
> The failure is hard to reproduce and very timing sensitive. These kind of 
> failures have always been seen right after "updateshardstate" action.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4744) Version conflict error during shard split test

Reply via email to