[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425767#comment-13425767
 ] 

Per Steffensen commented on SOLR-2592:
--------------------------------------

Maybe it is a little late to give this kind of input, and I have to admit that 
I did not read all comments already on this issue, but:
ElasticSearch has this problem solved already separating "unique id" and 
"routing". It works very well, and maybe you would benefit from taking a look 
at how they do.
There are some potentially problematic issues with this. If we allow to dynamic 
routing, either that the properties of the document that routing is based on is 
allowed to change or if the routing-rule itself is allowed to change (while you 
already have documents in your collection), you have a hard time making sure 
that updated documents are at any time living in the correct slice and that old 
versions of the document are not still living in other slices that used to be 
the place where the document had to live.

Example
Routing-rule: hash(doc.lastname)%number_of_slices_in_collection
Events:
- Document with a hash(lastname)%#slices equal to 5 is inserted into Solr. The 
document is stored in slice no 5.
- Client loads document and changes lastname, so that hash(lastname)%#slices is 
now 3. From now on this particular document needs to be stored in slice no 3 
(potentially running in different Solr instances than the ones running slice no 
5). And you need to make sure the old version of the document is deleted on 
slice 5.

In this case it is not simple to document-based synchronication (the bucket 
magic), preventing two clients from making two concurrent updates of the same 
document leading to inconsistency. Version-check/optimistic locking is very 
hard.
ElasticSearch (as I remember) solves this by allowing the routing to be based 
on document properties, but the routing is calculated once an for all on 
insertion time, and is stored as a "routing"-field on the document. If you 
later change the document-fields that the routing is based on the routing 
itself is not changed. This way documents never change where they live, but 
basically you cannot use the routing-rules to decide where to find your 
document.

My advice is to make sure routing of a document cannot change. So when it is 
created in a slice it will live there forever, no matter what kind of updates 
are made on the document. If routing rules are based on property values, to 
keep consistency between document location and property value, you should not 
allow properties that routing-rule is based on to change.

Regards, Per Steffensen
                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0-ALPHA
>            Reporter: Noble Paul
>            Assignee: Mark Miller
>         Attachments: SOLR-2592.patch, SOLR-2592_rev_2.patch, dbq_fix.patch, 
> pluggable_sharding.patch, pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to