[ 
https://issues.apache.org/jira/browse/SOLR-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680981#comment-16680981
 ] 

Shawn Heisey commented on SOLR-12974:
-------------------------------------

One thing you can do for a workaround is to upgrade to 7.x and use the new TLOG 
or PULL replica types.  Downside to this is that it requires upgrading to a new 
major version.  If you have a test environment, that may not be a major problem.

I suspect that it would be very difficult to guarantee the same index version 
when using NRT replicas, which was the only type before 7.x.  I could be wrong 
about that.


> RandomSort not consistent in SolrCloud Mode
> -------------------------------------------
>
>                 Key: SOLR-12974
>                 URL: https://issues.apache.org/jira/browse/SOLR-12974
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.5.1
>            Reporter: Shrey Shivam
>            Priority: Minor
>
> Expected behaviour of RandomSort is that given the same random field name 
> (random_<seed>) which acts a seed, the sorting order will remain consistent 
> with the same version of Solr Index.
> From schema.xml:
> {{~<!-- The "RandomSortField" is not used to store or search any data. You 
> can declare fields of this type it in your schema to generate pseudo-random 
> orderings of your docs for sorting or function purposes. The ordering is 
> generated based on the field name and the version of the index. As long as 
> the index version remains unchanged, and the same field name is reused, the 
> ordering of the docs will be consistent. If you want different psuedo-random 
> orderings of documents, for the same version of the index, use a dynamicField 
> and change the field name in the request. -->~}}
>  
> In master slave mode, replication happens based on index version. If version 
> number of slave is different than that of master, replication is done by 
> slaves and the index number is updated to match the index version of master.
> However in SolrCloud mode, observation has been that replicas of the same 
> shard do not maintain the same version number at all times even though the 
> documents are same and consistent. 
> This has been previously discussed in [mailing list 
> |https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201508.mbox/%3ccae3utzmggprv-p6juwjwm2yyyxfw893xayq7+2hav7mmobm...@mail.gmail.com%3E]as
>  well.
> {quote}SolrCloud works very differently than the old master-slave replication.
> The index is NOT copied from the leader to the other replicas, except
>  in extreme recovery circumstances.
> Each replica builds its own copy of the index independently from the
>  others. Due to slight timing differences in the indexing operations,
>  and possible actions related to transaction log replay on node restart,
>  each replica may end up with a different index layout. There also could
>  be differences in the number of deleted documents. Unless something
>  goes really wrong, all replicas should contain the same live documents.
> {quote}
>  
> When a query to a shard is made which has 2 or more replicas, any replica is 
> chosen to respond to the query. Now, if all replicas do not have the same 
> index number, RandomSort will generate random hash seed differently for the 
> same random_<seed> field name.
> In the source code of 
> [RandomSort|https://github.com/apache/lucene-solr/blob/branch_6_5/solr/core/src/java/org/apache/solr/schema/RandomSortField.java]
>  class, in line 86, it mentions the use of index version (of shard) to create 
> random hash seed.
> Hence when querying a Solr Collection, for the same query, Solr is giving 
> different results depending on version mismatch in replicas as well as based 
> on which replica is serving request each time.
>  
> Example of Solr Query where random field is being used:
> {code:java}
> https://solr-stage.mydomain.com:8983/solr/mycollection/select?wt=json&q=*:*&defType=edismax&fl=id&boost=if(query({!v='documentDate:[2018-11-07
>  TO 
> *]'}),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),1),sub(1,div(1,1))),if(or(exists(query({!v='documentType:sponsored'})),exists(query({!v='documentType:featured'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),4),sub(1,div(1,4))),
>  
> if(or(exists(query({!v='documentType:listing'})),exists(query({!v='documentType:promotional'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),2),sub(1,div(1,2))),scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1))))
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to