[ 
https://issues.apache.org/jira/browse/BLUR-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558267#comment-13558267
 ] 

Aaron McCurry commented on BLUR-55:
-----------------------------------

-I wonder if there's a real use-case for a client-determined per update shard 
assignment? In other words, I like the idea of a pluggable sharding strategy, 
but I don't see why it's not pluggable on the server conf - or a part of the 
table descriptor maybe?

This is exactly what I was thinking, that's how the properties hanging off the 
table descriptor should be used.  Also a note on internal implementation, these 
sorts of things that are table specific should be configured in the 
TableContext.  If you take a look at that object you will see that other 
options that are implemented.

-If we do keep the current approach, it seems like making the shard assignment 
per-document would be better than on the MutationOption (where "better" means 
more client-friendly:)) ?

Agreed, this is the basic flaw that I was describing in the previous comment.

-Though, maybe I'm missing something because I don't yet understand how the 
mutation shardindex assignment marries with the retrieval (doc thrift 
function). Is the shard assignment somehow packed in the docLocation?

It is, that's why it's a long.  The first 32-bit is the shard id, and the 
second 32-bit is the Lucene doc id.


Overall I like would to remove the shardIndex from the api during mutates 
altogether.  And because the strategy would likely control the number of shards 
in a table we should try to remove the number of shards defined in the 
TableDescriptor object.
                
> Pluggable sharding strategy
> ---------------------------
>
>                 Key: BLUR-55
>                 URL: https://issues.apache.org/jira/browse/BLUR-55
>             Project: Apache Blur
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: Tim Williams
>
> The 0.2-dev code currently is driven from the client.  We should make the 
> sharding strategy pluggable so that someone who needs something more than a 
> typical modulo on the docid over the shard count can do it. 
> From Aaron's response[1]:
> "So a couple of strategies that I have been thinking about.
> -Hash based where it would hash on a pre-configured field.  Field would not 
> be allowed to be null and the number of shards would be fixed.  Also the 
> shard placement provided by the user would be ignored.
> -User based where the user has total control over the placement of the 
> document by providing it during indexing.  If a shard index is provided in an 
> update and the current table does not continue that shard, then a new one 
> would be created and added to the table.
> As for now we are now somewhere in between.  The number of shards are fixed 
> and it's up to the user to provide the shard index.  I think (need to look at 
> the code) if the user provides a -1 then it randomly chooses a shard for the 
> document. It's could be dangerous for updates. We should create a jira issue 
> to discuss further and provide a better implementation."
> [1] - 
> http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201301.mbox/%3CC671051A-11E8-4721-AC95-D902250E3EA9%40gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to