[ 
https://issues.apache.org/jira/browse/SOLR-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Novikov updated SOLR-5821:
--------------------------------

    Attachment: Screen Shot 2014-04-05 at 2.26.41 AM.png
                Screen Shot 2014-04-05 at 2.26.26 AM.png

SolrCloud infrastructure:

3 ZooKeeper nodes + 3 Solr replicas (1 shard) on Tomcat 7.

When importing the data from the database through one of the Solr instances 
(DataImportHandler) another Solr instance was down (had to be restarted). The 
result of that you can see in the screenshots. The number of items on 1st 
machine is 9,812,001 items, on that one that was down for a couple of seconds 
is 9,811,987.

PS And the worst thing is that I can't see a way to synchronize them now, as 
replication requests via HTTP don't seem to be working as in SolrCloud all the 
nodes behave like "masters" and HTTP replication request (pulling data from the 
master to a slave) just fails. But even if it worked, it wouldn't be really 
appropriate. That way you would need to perform consistent checks all the time 
(as data continues coming), and do something on your own...

> Search inconsistency on SolrCloud replicas
> ------------------------------------------
>
>                 Key: SOLR-5821
>                 URL: https://issues.apache.org/jira/browse/SOLR-5821
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.6.1, 4.7.1
>         Environment: SolrCloud:
> 1 shard, 2 replicas
> Both instances/replicas have identical hardware/software:
> CPU(s): 4
> RAM: 8Gb
> HDD: 100Gb
> OS: CentOS 6.5
> ZooKeeper 3.4.5
> Tomcat 8.0.3
> Solr 4.6.1
> Servers are utilized to run Solr only.
>            Reporter: Maxim Novikov
>            Priority: Critical
>              Labels: cloud, inconsistency, replica, search
>         Attachments: Screen Shot 2014-04-05 at 2.26.26 AM.png, Screen Shot 
> 2014-04-05 at 2.26.41 AM.png
>
>
> We use the following infrastructure:
> SolrCloud with 1 shard and 2 replicas. The index is built using 
> DataImportHandler (importing data from the database). The number of items in 
> the index can vary from 100 to 100,000,000.
> After indexing part of the data (not necessarily all the data, it is enough 
> to have a small number of items in the search index), we can observe that 
> Solr instances (replicas) return different results for the same search 
> queries. I believe it happens because some of the results have the same 
> scores, and Solr instances return those in a random order.
> PS This is a critical issue for us as we use a load balancer to scale Solr 
> through replicas, and as a result of this issue, we retrieve various results 
> for the same queries all the time. They are not necessarily completely 
> different, but even a couple of items that differ is a deal breaker.
> The expected behaviour would be to always get identical results for the same 
> search queries from all replicas. Otherwise, this "cloud" thing works just 
> unreliably.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to