[
https://issues.apache.org/jira/browse/SOLR-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679319#comment-14679319
]
Scott Blum commented on SOLR-7869:
----------------------------------
1) So this doesn't actually fix anything yet, because there are no changes to
Overseer itself? Presumably you'd need to catch BVE in overseer and
force-refresh reader clusterState?
2) Just noting that this seems the opposite of what we discussed earlier. I
interpreted your earlier comments to mean that we should blow away the ZK data
in favor of the overseer data, since overseer is authoritative. This patch
seems do the opposite, preferring external user changes. To wit "it is
guaranteed that overwriting cluster state with prevState will not discard any
updates that Overseer had performed unless such an act was done externally by
the user".
3) In ZkStateWriterTest, I note that ZkStateWriter isn't super amenable to
testing, it's kind of subtle that enqueuing an update sometimes causes a flush,
and sometimes does. Dunno if it's better or worse to have test-visible methods
for doing a queue-without-flush and then explicit flush.
4) In ZkStateWriterTest.testExternalModificationToSharedClusterState(), first
try block, you're missing a fail() after the enqueueUpdate to test that the
exception really did occur. In the first catch block, I'm not sure it's good
to log the expected exception, I always find it confusing when tests log
exceptions that don't actually cause the test to fail. I would remove the
second catch block; if you get any other exception than the one you expect,
best to just let it escape and let the test framework get it.
5) In a similar fashion, I would remove the second try/catch block entirely,
just keeping the body of the try. You expect that none of it will throw an
exception, so just leave it unadorned and the test framework will handle if
there is.
> Overseer does not handle BadVersionException correctly
> ------------------------------------------------------
>
> Key: SOLR-7869
> URL: https://issues.apache.org/jira/browse/SOLR-7869
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 5.2.1
> Reporter: Shalin Shekhar Mangar
> Assignee: Shalin Shekhar Mangar
> Labels: difficulty-medium, impact-low
> Fix For: Trunk, 5.4
>
> Attachments: SOLR-7869.patch, SOLR-7869.patch
>
>
> If the /clusterstate.json is modified externally then the Overseer can go
> into an infinite loop upon a BadVersionException alternately trying to
> execute main queue and then the work queue:
> {code}
> ERROR - 2015-08-04 18:49:56.224; [ ]
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Exception in Overseer
> work queue loop
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /clusterstate.json
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
> at
> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)
> at
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:180)
> at
> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:67)
> at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:286)
> at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:168)
> at java.lang.Thread.run(Thread.java:745)
> INFO - 2015-08-04 18:49:56.224; [ ]
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; processMessage:
> queueSize: 1, message = {
> "operation":"state",
> "state":"down",
> "base_url":"http://127.0.1.1:7574/solr",
> "core":"test_shard1_replica1",
> "roles":null,
> "node_name":"127.0.1.1:7574_solr",
> "shard":null,
> "collection":"test",
> "core_node_name":"core_node1"} current state version: 9
> INFO - 2015-08-04 18:49:56.224; [ ]
> org.apache.solr.cloud.overseer.ReplicaMutator; Update state numShards=null
> message={
> "operation":"state",
> "state":"down",
> "base_url":"http://127.0.1.1:7574/solr",
> "core":"test_shard1_replica1",
> "roles":null,
> "node_name":"127.0.1.1:7574_solr",
> "shard":null,
> "collection":"test",
> "core_node_name":"core_node1"}
> INFO - 2015-08-04 18:49:56.224; [ ]
> org.apache.solr.cloud.overseer.ReplicaMutator; shard=shard1 is already
> registered
> ERROR - 2015-08-04 18:49:56.225; [ ]
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Exception in Overseer
> main queue loop
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /clusterstate.json
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:362)
> at
> org.apache.solr.common.cloud.SolrZkClient$8.execute(SolrZkClient.java:359)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
> at
> org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:359)
> at
> org.apache.solr.cloud.overseer.ZkStateWriter.writePendingUpdates(ZkStateWriter.java:180)
> at
> org.apache.solr.cloud.overseer.ZkStateWriter.enqueueUpdate(ZkStateWriter.java:67)
> at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.processQueueItem(Overseer.java:286)
> at
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:213)
> at java.lang.Thread.run(Thread.java:745)
> INFO - 2015-08-04 18:49:56.225; [ ]
> org.apache.solr.common.cloud.ZkStateReader; Updating data for gettingstarted
> to ver 8
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]