[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846755#comment-13846755
 ] 

Timothy Potter commented on SOLR-4260:
--------------------------------------

I'm sorry for being unclear; "waiting" was probably the wrong term ... and they 
definitely continue right on down the path of selecting the wrong leader. 

Here's what I know so far, which admittedly isn't much:

As cloud85 (replica before it crashed) is initializing, it enters the wait 
process in ShardLeaderElectionContext#waitForReplicasToComeUp; this is expected 
and a good thing.

Some short amount of time in the future, cloud84 (leader before it crashed) 
begins initializing and gets to a point where it adds itself as a possible 
leader for the shard (by creating a znode under 
/collections/cloud/leaders_elect/shard1/election), which leads to cloud85 being 
able to return from waitForReplicasToComeUp and try to determine who should be 
the leader.

cloud85 then tries to run the SyncStrategy, which can never work because in 
this scenario the Jetty HTTP listener is not active yet on either node, so all 
replication work that uses HTTP requests fails on both nodes ... PeerSync 
treats these failures as indicators that the other replicas in the shard are 
unavailable (or whatever) and assumes success. Here's the log message:

2013-12-11 11:43:25,936 [coreLoadExecutor-3-thread-1] WARN solr.update.PeerSync 
- PeerSync: core=cloud_shard1_replica1 url=http://cloud85:8985/solr couldn't 
connect to http://cloud84:8984/solr/cloud_shard1_replica2/, counting as success

The Jetty HTTP listener doesn't start accepting connections until long after 
this process has completed and already selected the wrong leader.

>From what I can see, we seem to have a leader recovery process that is based 
>partly on HTTP requests to the other nodes, but the HTTP listener on those 
>nodes isn't active yet. We need a leader recovery process that doesn't rely on 
>HTTP requests. Perhaps, leader recovery for a shard w/o a current leader may 
>need to work differently than leader election in a shard that has replicas 
>that can respond to HTTP requests? All of what I'm seeing makes perfect sense 
>for leader election when there are active replicas and the current leader 
>fails.

All this aside, I'm not asserting that this is the only cause for the 
out-of-sync issues reported in this ticket, but it definitely seems like it 
could happen in a real cluster.

> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>
>                 Key: SOLR-4260
>                 URL: https://issues.apache.org/jira/browse/SOLR-4260
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>         Environment: 5.0.0.2013.01.04.15.31.51
>            Reporter: Markus Jelsma
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 5.0, 4.7
>
>         Attachments: 192.168.20.102-replica1.png, 
> 192.168.20.104-replica2.png, clusterstate.png
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using 
> CloudSolrServer we see inconsistencies between the leader and replica for 
> some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
> a small deviation in then number of documents. The leader and slave deviate 
> for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my 
> attention, there were small IDF differences for exactly the same record 
> causing a record to shift positions in the result set. During those tests no 
> records were indexed. Consecutive catch all queries also return different 
> number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor 
> of two and frequently reindex using a fresh build from trunk. I've not seen 
> this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to