[
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Garski updated SOLR-2592:
---------------------------------
Attachment: pluggable_sharding.patch
This patch is intended to be a cocktail napkin sketch to get feedback (as such
forwarding queries to the appropriate shards is not yet implemented). I can
iterate on this as needed.
The attached patch is a very simple implementation of pluggable sharding which
works as follows:
1. Configure a ShardingStrategy in SolrConfig under config/shardingStrategy, if
none is configured the default implementation of sharding on the document's
unique id will be performed.
{code:xml}
<shardingStrategy class="solr.UniqueIdShardingStrategy"/>
{code}
2. The ShardingStrategy accepts an AddUpdateCommand, DeleteUpdateCommand, or
SolrParams to return a BytesRef that is hashed to determine the destination
slice.
3. I have only implemented updates at this time, queries are still distributed
across all shards in the collection. I have added a param to
common.params.ShardParams for a 'shard.keys' parameter that would contain the
value(s) which is(are) to be hashed to determine the shard(s) which is(are) to
be queried within the the HttpShardHandler.checkDistributed method. if
'shard.keys' does not have a value the query would be distributed across all
shards in the collection.
Notes:
There are no unit tests yet however all existing tests pass.
I am not quite sure about the configuration location within solr config,
however as sharding is used by both update and search requests placing it in
the udpateHandler and (potentially multiple) requestHandler sections would
require a duplication of the same information in the solr config for what I
believe is more of a collection-wide configuration.
As hashing currently requires the lucene.util.BytesRef class the solrj client
can not currently hash the request to send the request to a specific node
without having solrj add a dependency on lucene core - something that is most
likely not desired. Additionally, hashing on a unique id also requires access
to the schema as well to determine the field that contains the unique id. Are
there any thoughts on how to alter the hashing to remove these dependencies and
allow for solrj to be a 'smart' client that submits requests directly to nodes
that contain the data?
How would solrj work when multiple updates are included in the request that
belong to different shards? Send the request to one of the nodes and let the
server distribute them to the proper nodes? Perform concurrent requests to the
specific nodes?
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
> Key: SOLR-2592
> URL: https://issues.apache.org/jira/browse/SOLR-2592
> Project: Solr
> Issue Type: New Feature
> Components: SolrCloud
> Affects Versions: 4.0
> Reporter: Noble Paul
> Attachments: pluggable_sharding.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash,
> attribute value etc) It will be easy to narrow down the search to a smaller
> subset of shards and in effect can achieve more efficient search.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]