[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530162#comment-13530162
 ] 

Shawn Heisey commented on SOLR-2592:
------------------------------------

I use the hot shard concept in Solr 3.5.0.  For the cold shards, I split 
documents using a MOD on the CRC32 hash of a MySQL bigint autoincrement field - 
my MySQL query does the CRC32 and the MOD.  That field's actual value is 
translated to a tlong field in the schema.  For the hot shard, I simply use a 
split point on the actual value of that field.  Everything less than or equal 
to the split point goes to the cold shards, everything greater than the split 
point goes to the hot shard.  Multiple shards are handled by a single Solr 
instance - seven shards live on two servers.

This arrangement requires that I do a daily "distribute" process where I index 
(from MySQL) data between the old split point and the new split point to the 
cold shards, then delete that data from the hot shard. Full reindexes are done 
with the dataimport handler and controlled by SolrJ, everything else (including 
the distribute) is done directly with SolrJ.

How much of that could be automated and put server-side with the features added 
by this issue?  If I have to track shard and core names myself in order to do 
the distribute, then I will have to decide whether the other automation I would 
gain is worth switching to SolrCloud.

If I could avoid the client-side distribute indexing and have Solr shuffle the 
data around itself, that would be awesome, but I'm not sure that's possible, 
and it may be somewhat complicated by the fact that I have a number of unstored 
fields that I search on.

At some point I will test performance on an index where I do not have a hot 
shard, where the data is simply hashed between several large shards.  This 
entire concept was implemented for fast indexing of new data - because Solr 1.4 
did not have NRT features.

                
> Custom Hashing
> --------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0-ALPHA
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 4.1
>
>         Attachments: dbq_fix.patch, pluggable_sharding.patch, 
> pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
> SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
> SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
> SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, 
> attribute value etc) It will be easy to narrow down the search to a smaller 
> subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to