[ 
https://issues.apache.org/jira/browse/SOLR-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254288#comment-15254288
 ] 

Shalin Shekhar Mangar commented on SOLR-9030:
---------------------------------------------

It exists to ensure that we do not update/overwrite a cluster state if we had 
no idea of its previous znode version. Also the default value of znode in a 
DocCollection is -1. If left unchecked, ZK will overwrite the value in the 
state without the CAS checks that we rely on.

bq. And shouldn't we expect that that can happen and deal with it 
appropriately? (A retry or something?)

Yes and it does recover automatically. A BadVersionException will cause the 
complete cluster state to be re-fetched from ZK and the operation is retried. 
In production environments, the BadVersionException will not be a problem but 
the overwriting of state can be.

> The 'downnode' command can trip asserts in ZkStateWriter or cause 
> BadVersionException in Overseer
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9030
>                 URL: https://issues.apache.org/jira/browse/SOLR-9030
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>             Fix For: master, 6.1
>
>
> While working on SOLR-9014 I came across a strange test failure.
> {code}
>    [junit4] ERROR   16.9s | 
> AsyncCallRequestStatusResponseTest.testAsyncCallStatusResponse <<<
>    [junit4]    > Throwable #1: 
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
> uncaught exception in thread: Thread[id=46, 
> name=OverseerStateUpdate-95769832112259076-127.0.0.1:51135_z_oeg%2Ft-n_0000000000,
>  state=RUNNABLE, group=Overseer state updater.]
>    [junit4]    >      at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3:CBF7E84BCF328A1A]:0)
>    [junit4]    > Caused by: java.lang.AssertionError
>    [junit4]    >      at 
> __randomizedtesting.SeedInfo.seed([91F68DA7E10807C3]:0)
>    [junit4]    >      at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:231)
>    [junit4]    >      at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:240)
>    [junit4]    >      at java.lang.Thread.run(Thread.java:745)
> {code}
> The underlying problem can manifest by tripping the above assert or a 
> BadVersionException as well. I found that this was introduced in SOLR-7281 
> where a new 'downnode' command was added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to