[ 
https://issues.apache.org/jira/browse/SOLR-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679319#comment-14679319
 ] 

Scott Blum commented on SOLR-7869:
----------------------------------

1) So this doesn't actually fix anything yet, because there are no changes to 
Overseer itself?  Presumably you'd need to catch BVE in overseer and 
force-refresh reader clusterState?

2) Just noting that this seems the opposite of what we discussed earlier.  I 
interpreted your earlier comments to mean that we should blow away the ZK data 
in favor of the overseer data, since overseer is authoritative.  This patch 
seems do the opposite, preferring external user changes.  To wit "it is 
guaranteed that overwriting cluster state with prevState will not discard any 
updates that Overseer had performed unless such an act was done externally by 
the user".

3) In ZkStateWriterTest, I note that ZkStateWriter isn't super amenable to 
testing, it's kind of subtle that enqueuing an update sometimes causes a flush, 
and sometimes does.  Dunno if it's better or worse to have test-visible methods 
for doing a queue-without-flush and then explicit flush.

4) In ZkStateWriterTest.testExternalModificationToSharedClusterState(), first 
try block, you're missing a fail() after the enqueueUpdate to test that the 
exception really did occur.  In the first catch block, I'm not sure it's good 
to log the expected exception, I always find it confusing when tests log 
exceptions that don't actually cause the test to fail.  I would remove the 
second catch block; if you get any other exception than the one you expect, 
best to just let it escape and let the test framework get it.

5) In a similar fashion, I would remove the second try/catch block entirely, 
just keeping the body of the try.  You expect that none of it will throw an 
exception, so just leave it unadorned and the test framework will handle if 
there is.

> Overseer does not handle BadVersionException correctly
> ------------------------------------------------------
>
>                 Key: SOLR-7869
>                 URL: https://issues.apache.org/jira/browse/SOLR-7869
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.2.1
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>              Labels: difficulty-medium, impact-low
>             Fix For: Trunk, 5.4
>
>         Attachments: SOLR-7869.patch, SOLR-7869.patch
>
>
> If the /clusterstate.json is modified externally then the Overseer can go 
> into an infinite loop upon a BadVersionException alternately trying to 
> execute main queue and then the work queue:
> {code}
> ERROR - 2015-08-04 18:49:56.224; [   ] 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Exception in Overseer 
> work queue loop
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /clusterstate.json
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)
>         at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:180)
>         at 
> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:67)
>         at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:286)
>         at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:168)
>         at java.lang.Thread.run(Thread.java:745)
> INFO  - 2015-08-04 18:49:56.224; [   ] 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; processMessage: 
> queueSize: 1, message = {
>   "operation":"state",
>   "state":"down",
>   "base_url":"http://127.0.1.1:7574/solr";,
>   "core":"test_shard1_replica1",
>   "roles":null,
>   "node_name":"127.0.1.1:7574_solr",
>   "shard":null,
>   "collection":"test",
>   "core_node_name":"core_node1"} current state version: 9
> INFO  - 2015-08-04 18:49:56.224; [   ] 
> org.apache.solr.cloud.overseer.ReplicaMutator; Update state numShards=null 
> message={
>   "operation":"state",
>   "state":"down",
>   "base_url":"http://127.0.1.1:7574/solr";,
>   "core":"test_shard1_replica1",
>   "roles":null,
>   "node_name":"127.0.1.1:7574_solr",
>   "shard":null,
>   "collection":"test",
>   "core_node_name":"core_node1"}
> INFO  - 2015-08-04 18:49:56.224; [   ] 
> org.apache.solr.cloud.overseer.ReplicaMutator; shard=shard1 is already 
> registered
> ERROR - 2015-08-04 18:49:56.225; [   ] 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Exception in Overseer 
> main queue loop
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /clusterstate.json
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)
>         at 
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:180)
>         at 
> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:67)
>         at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:286)
>         at 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:213)
>         at java.lang.Thread.run(Thread.java:745)
> INFO  - 2015-08-04 18:49:56.225; [   ] 
> org.apache.solr.common.cloud.ZkStateReader; Updating data for gettingstarted 
> to ver 8
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to