[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Garski updated SOLR-2592:
---------------------------------

    Attachment: pluggable_sharding.patch

This patch is intended to be a cocktail napkin sketch to get feedback (as such 
forwarding queries to the appropriate shards is not yet implemented). I can 
iterate on this as needed.

The attached patch is a very simple implementation of pluggable sharding which 
works as follows:

1. Configure a ShardingStrategy in SolrConfig under config/shardingStrategy, if 
none is configured the default implementation of sharding on the document's 
unique id will be performed.
{code:xml} 
     <shardingStrategy class="solr.UniqueIdShardingStrategy"/>
{code} 

2. The ShardingStrategy accepts an AddUpdateCommand, DeleteUpdateCommand, or 
SolrParams to return a BytesRef that is hashed to determine the destination 
slice.

3. I have only implemented updates at this time, queries are still distributed 
across all shards in the collection. I have added a param to 
common.params.ShardParams for a 'shard.keys' parameter that would contain the 
value(s) which is(are) to be hashed to determine the shard(s) which is(are) to 
be queried within the the HttpShardHandler.checkDistributed method. if 
'shard.keys' does not have a value the query would be distributed across all 
shards in the collection.

Notes:

There are no unit tests yet however all existing tests pass. 

I am not quite sure about the configuration location within solr config, 
however as sharding is used by both update and search requests placing it in 
the udpateHandler and (potentially multiple) requestHandler sections would 
require a duplication of the same information in the solr config for what I 
believe is more of a collection-wide configuration.

As hashing currently requires the lucene.util.BytesRef class the solrj client 
can not currently hash the request to send the request to a specific node 
without having solrj add a dependency on lucene core - something that is most 
likely not desired.  Additionally, hashing on a unique id also requires access 
to the schema as well to determine the field that contains the unique id. Are 
there any thoughts on how to alter the hashing to remove these dependencies and 
allow for solrj to be a 'smart' client that submits requests directly to nodes 
that contain the data?

How would solrj work when multiple updates are included in the request that 
belong to different shards? Send the request to one of the nodes and let the 
server distribute them to the proper nodes? Perform concurrent requests to the 
specific nodes?


                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: pluggable_sharding.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to