[
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425767#comment-13425767
]
Per Steffensen commented on SOLR-2592:
--------------------------------------
Maybe it is a little late to give this kind of input, and I have to admit that
I did not read all comments already on this issue, but:
ElasticSearch has this problem solved already separating "unique id" and
"routing". It works very well, and maybe you would benefit from taking a look
at how they do.
There are some potentially problematic issues with this. If we allow to dynamic
routing, either that the properties of the document that routing is based on is
allowed to change or if the routing-rule itself is allowed to change (while you
already have documents in your collection), you have a hard time making sure
that updated documents are at any time living in the correct slice and that old
versions of the document are not still living in other slices that used to be
the place where the document had to live.
Example
Routing-rule: hash(doc.lastname)%number_of_slices_in_collection
Events:
- Document with a hash(lastname)%#slices equal to 5 is inserted into Solr. The
document is stored in slice no 5.
- Client loads document and changes lastname, so that hash(lastname)%#slices is
now 3. From now on this particular document needs to be stored in slice no 3
(potentially running in different Solr instances than the ones running slice no
5). And you need to make sure the old version of the document is deleted on
slice 5.
In this case it is not simple to document-based synchronication (the bucket
magic), preventing two clients from making two concurrent updates of the same
document leading to inconsistency. Version-check/optimistic locking is very
hard.
ElasticSearch (as I remember) solves this by allowing the routing to be based
on document properties, but the routing is calculated once an for all on
insertion time, and is stored as a "routing"-field on the document. If you
later change the document-fields that the routing is based on the routing
itself is not changed. This way documents never change where they live, but
basically you cannot use the routing-rules to decide where to find your
document.
My advice is to make sure routing of a document cannot change. So when it is
created in a slice it will live there forever, no matter what kind of updates
are made on the document. If routing rules are based on property values, to
keep consistency between document location and property value, you should not
allow properties that routing-rule is based on to change.
Regards, Per Steffensen
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
> Key: SOLR-2592
> URL: https://issues.apache.org/jira/browse/SOLR-2592
> Project: Solr
> Issue Type: New Feature
> Components: SolrCloud
> Affects Versions: 4.0-ALPHA
> Reporter: Noble Paul
> Assignee: Mark Miller
> Attachments: SOLR-2592.patch, SOLR-2592_rev_2.patch, dbq_fix.patch,
> pluggable_sharding.patch, pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash,
> attribute value etc) It will be easy to narrow down the search to a smaller
> subset of shards and in effect can achieve more efficient search.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]