[
https://issues.apache.org/jira/browse/SOLR-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated SOLR-5552:
------------------------------
Attachment: SOLR-5552.patch
I took a couple hours to look at this today. Here is a patch that fixes a few
things - will probably file another JIRA issue or two around them.
* First, it registers cores in zk on startup in background threads. Turns out
it's not super simple to know when http is up, but in my testing, things seem
to work out all right if we just don't block things in filter#init when we load
the cores. Regardless, it's a large improvement and this was a serious bug. We
have never had enough cluster restart tests. Though, as it turns out, this
seems difficult to reproduce in tests for some reason, though easy by hand.
* There was a problem when we looked to see if anyone else was active in
determining if we become the leader. That is really no good, I thought I had
removed it before.
* We could start recovering from a leader that was in the middle of replaying
his tran log, which is nasty because the pre replication commit can be ignored
and those updates are not distributed.
> Leader recovery process can select the wrong leader if all replicas for a
> shard are down and trying to recover
> --------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-5552
> URL: https://issues.apache.org/jira/browse/SOLR-5552
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Reporter: Timothy Potter
> Labels: leader, recovery
> Fix For: 5.0, 4.7, 4.6.1
>
> Attachments: SOLR-5552.patch, SOLR-5552.patch
>
>
> One particular issue that leads to out-of-sync shards, related to SOLR-4260
> Here's what I know so far, which admittedly isn't much:
> As cloud85 (replica before it crashed) is initializing, it enters the wait
> process in ShardLeaderElectionContext#waitForReplicasToComeUp; this is
> expected and a good thing.
> Some short amount of time in the future, cloud84 (leader before it crashed)
> begins initializing and gets to a point where it adds itself as a possible
> leader for the shard (by creating a znode under
> /collections/cloud/leaders_elect/shard1/election), which leads to cloud85
> being able to return from waitForReplicasToComeUp and try to determine who
> should be the leader.
> cloud85 then tries to run the SyncStrategy, which can never work because in
> this scenario the Jetty HTTP listener is not active yet on either node, so
> all replication work that uses HTTP requests fails on both nodes ... PeerSync
> treats these failures as indicators that the other replicas in the shard are
> unavailable (or whatever) and assumes success. Here's the log message:
> 2013-12-11 11:43:25,936 [coreLoadExecutor-3-thread-1] WARN
> solr.update.PeerSync - PeerSync: core=cloud_shard1_replica1
> url=http://cloud85:8985/solr couldn't connect to
> http://cloud84:8984/solr/cloud_shard1_replica2/, counting as success
> The Jetty HTTP listener doesn't start accepting connections until long after
> this process has completed and already selected the wrong leader.
> From what I can see, we seem to have a leader recovery process that is based
> partly on HTTP requests to the other nodes, but the HTTP listener on those
> nodes isn't active yet. We need a leader recovery process that doesn't rely
> on HTTP requests. Perhaps, leader recovery for a shard w/o a current leader
> may need to work differently than leader election in a shard that has
> replicas that can respond to HTTP requests? All of what I'm seeing makes
> perfect sense for leader election when there are active replicas and the
> current leader fails.
> All this aside, I'm not asserting that this is the only cause for the
> out-of-sync issues reported in this ticket, but it definitely seems like it
> could happen in a real cluster.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]