[
https://issues.apache.org/jira/browse/BLUR-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558267#comment-13558267
]
Aaron McCurry commented on BLUR-55:
-----------------------------------
-I wonder if there's a real use-case for a client-determined per update shard
assignment? In other words, I like the idea of a pluggable sharding strategy,
but I don't see why it's not pluggable on the server conf - or a part of the
table descriptor maybe?
This is exactly what I was thinking, that's how the properties hanging off the
table descriptor should be used. Also a note on internal implementation, these
sorts of things that are table specific should be configured in the
TableContext. If you take a look at that object you will see that other
options that are implemented.
-If we do keep the current approach, it seems like making the shard assignment
per-document would be better than on the MutationOption (where "better" means
more client-friendly:)) ?
Agreed, this is the basic flaw that I was describing in the previous comment.
-Though, maybe I'm missing something because I don't yet understand how the
mutation shardindex assignment marries with the retrieval (doc thrift
function). Is the shard assignment somehow packed in the docLocation?
It is, that's why it's a long. The first 32-bit is the shard id, and the
second 32-bit is the Lucene doc id.
Overall I like would to remove the shardIndex from the api during mutates
altogether. And because the strategy would likely control the number of shards
in a table we should try to remove the number of shards defined in the
TableDescriptor object.
> Pluggable sharding strategy
> ---------------------------
>
> Key: BLUR-55
> URL: https://issues.apache.org/jira/browse/BLUR-55
> Project: Apache Blur
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: Tim Williams
>
> The 0.2-dev code currently is driven from the client. We should make the
> sharding strategy pluggable so that someone who needs something more than a
> typical modulo on the docid over the shard count can do it.
> From Aaron's response[1]:
> "So a couple of strategies that I have been thinking about.
> -Hash based where it would hash on a pre-configured field. Field would not
> be allowed to be null and the number of shards would be fixed. Also the
> shard placement provided by the user would be ignored.
> -User based where the user has total control over the placement of the
> document by providing it during indexing. If a shard index is provided in an
> update and the current table does not continue that shard, then a new one
> would be created and added to the table.
> As for now we are now somewhere in between. The number of shards are fixed
> and it's up to the user to provide the shard index. I think (need to look at
> the code) if the user provides a -1 then it randomly chooses a shard for the
> document. It's could be dangerous for updates. We should create a jira issue
> to discuss further and provide a better implementation."
> [1] -
> http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201301.mbox/%3CC671051A-11E8-4721-AC95-D902250E3EA9%40gmail.com%3E
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira