[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

Mark Miller (JIRA) Fri, 15 Nov 2013 05:30:17 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823652#comment-13823652
 ]


Mark Miller commented on SOLR-4260:
-----------------------------------

bq. What if the difference is greater than 100? Is there any other way to 
figure out who is the "truth" and force that state onto the other replicas by 
doing a full sync?

That is basically what should happen - everyone in the leader line will "try" 
and become the leader by trying to peer sync with everyone else - either they 
will be ahead of everyone else and the sync will succeed or they will be behind 
by less than 100 updates and trade and the sync will succeed. If the sync 
fails, the next guy in line tries. Eventually the most up to date guy should 
succeed and he forces everyone else to match him. That is the idea anyway.

bq. Newbie question: Why would the leader be behind? 

ZooKeeper session timeouts (due to load, gc, whatever) can cause the leader to 
be bumped.

You mainly only expect this stuff to happen if nodes go down (and perhaps come 
back) or session expirations.

Unfortunately, for a while between 4.4 and 4.5, a couple of our important tests 
stopped working and I think a couple problems were introduced. I hope to have 
more time to look into it soon.

> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>
>                 Key: SOLR-4260
>                 URL: https://issues.apache.org/jira/browse/SOLR-4260
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.0
>         Environment: 5.0.0.2013.01.04.15.31.51
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 5.0
>
>         Attachments: 192.168.20.102-replica1.png, 
> 192.168.20.104-replica2.png, clusterstate.png
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using 
> CloudSolrServer we see inconsistencies between the leader and replica for 
> some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
> a small deviation in then number of documents. The leader and slave deviate 
> for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my 
> attention, there were small IDF differences for exactly the same record 
> causing a record to shift positions in the result set. During those tests no 
> records were indexed. Consecutive catch all queries also return different 
> number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor 
> of two and frequently reindex using a fresh build from trunk. I've not seen 
> this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

Reply via email to