[
https://issues.apache.org/jira/browse/BLUR-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558262#comment-13558262
]
Aaron McCurry commented on BLUR-55:
-----------------------------------
As a follow up to the current implementation:
private int getShardIndex(MutateOptions options) {
int shardIndex = options.getShardIndex();
if (shardIndex < 0) {
// @TODO this is going to be very slow
TableDescriptor tableDescriptor = _clusterStatus.getTableDescriptor(true,
options.getTable());
int shardCount = tableDescriptor.getShardCount();
Random random = new Random();
return random.nextInt(shardCount);
}
return shardIndex;
}
So if the shard index is not set to anything (-1 should be the default) it will
pick a random shard to place the mutate. However this is broken, because if
the mutate is a delete or an update then it will likely be sent to the wrong
shard. In it's current implementation deletes should be broadcast to all
shards and for updates, the delete half should be broadcast and the update
should be sent to a single shard.
> Pluggable sharding strategy
> ---------------------------
>
> Key: BLUR-55
> URL: https://issues.apache.org/jira/browse/BLUR-55
> Project: Apache Blur
> Issue Type: Bug
> Affects Versions: 0.2.0
> Reporter: Tim Williams
>
> The 0.2-dev code currently is driven from the client. We should make the
> sharding strategy pluggable so that someone who needs something more than a
> typical modulo on the docid over the shard count can do it.
> From Aaron's response[1]:
> "So a couple of strategies that I have been thinking about.
> -Hash based where it would hash on a pre-configured field. Field would not
> be allowed to be null and the number of shards would be fixed. Also the
> shard placement provided by the user would be ignored.
> -User based where the user has total control over the placement of the
> document by providing it during indexing. If a shard index is provided in an
> update and the current table does not continue that shard, then a new one
> would be created and added to the table.
> As for now we are now somewhere in between. The number of shards are fixed
> and it's up to the user to provide the shard index. I think (need to look at
> the code) if the user provides a -1 then it randomly chooses a shard for the
> document. It's could be dangerous for updates. We should create a jira issue
> to discuss further and provide a better implementation."
> [1] -
> http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201301.mbox/%3CC671051A-11E8-4721-AC95-D902250E3EA9%40gmail.com%3E
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira