[jira] [Commented] (SOLR-13399) compositeId support for shard splitting

Hoss Man (JIRA) Thu, 08 Aug 2019 16:29:04 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903409#comment-16903409
 ]


Hoss Man commented on SOLR-13399:
---------------------------------

i would assume it's related to the (numSubShards) changes in SplitShardCmd ?

At first glance, that code path looks like it's specific to SPLIT_BY_PREFIX, 
but apparently your previous commit has it defaulting to "true" ? (see 
SplitShardCmd.java L212)
{noformat}
$ git show 19ddcfd282f3b9eccc50da83653674e510229960 -- 
core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java | cat
commit 19ddcfd282f3b9eccc50da83653674e510229960
Author: yonik <[email protected]>
Date:   Tue Aug 6 14:09:54 2019 -0400

    SOLR-13399: ability to use id field for compositeId histogram

diff --git 
a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java 
b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
index 4d623be..6c5921e 100644
--- 
a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
+++ 
b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
@@ -212,16 +212,14 @@ public class SplitShardCmd implements 
OverseerCollectionMessageHandler.Cmd {
       if (message.getBool(CommonAdminParams.SPLIT_BY_PREFIX, true)) {
         t = timings.sub("getRanges");
 
-        log.info("Requesting split ranges from replica " + 
parentShardLeader.getName() + " as part of slice " + slice + " of collection "
-            + collectionName + " on " + parentShardLeader);
-
         ModifiableSolrParams params = new ModifiableSolrParams();
         params.set(CoreAdminParams.ACTION, 
CoreAdminParams.CoreAdminAction.SPLIT.toString());
         params.set(CoreAdminParams.GET_RANGES, "true");
         params.set(CommonAdminParams.SPLIT_METHOD, splitMethod.toLower());
         params.set(CoreAdminParams.CORE, parentShardLeader.getStr("core"));
-        int numSubShards = message.getInt(NUM_SUB_SHARDS, 
DEFAULT_NUM_SUB_SHARDS);
-        params.set(NUM_SUB_SHARDS, Integer.toString(numSubShards));
+        // Only 2 is currently supported
+        // int numSubShards = message.getInt(NUM_SUB_SHARDS, 
DEFAULT_NUM_SUB_SHARDS);
+        // params.set(NUM_SUB_SHARDS, Integer.toString(numSubShards));
 
         {
           final ShardRequestTracker shardRequestTracker = 
ocmh.asyncRequestTracker(asyncId);
@@ -236,7 +234,7 @@ public class SplitShardCmd implements 
OverseerCollectionMessageHandler.Cmd {
             NamedList shardRsp = (NamedList)successes.getVal(0);
             String splits = (String)shardRsp.get(CoreAdminParams.RANGES);
             if (splits != null) {
-              log.info("Resulting split range to be used is " + splits);
+              log.info("Resulting split ranges to be used: " + splits + " 
slice=" + slice + " leader=" + parentShardLeader);
               // change the message to use the recommended split ranges
               message = message.plus(CoreAdminParams.RANGES, splits);
             }

{noformat}
 

 (I could be totally of base though -- i don't really understand 90% of what 
this test is doing, and the place where it fails doesn't seem to be trying to 
split into more then 2 subshards, so even if the SplitSHardCmd changes i 
pointed out are buggy, i'm not sure why it would cause this particular failure)

 

> compositeId support for shard splitting
> ---------------------------------------
>
>                 Key: SOLR-13399
>                 URL: https://issues.apache.org/jira/browse/SOLR-13399
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>            Priority: Major
>             Fix For: 8.3
>
>         Attachments: SOLR-13399.patch, SOLR-13399.patch, 
> SOLR-13399_testfix.patch, SOLR-13399_useId.patch, 
> ShardSplitTest.master.seed_AE04B5C9BA6E9A4.log.txt
>
>
> Shard splitting does not currently have a way to automatically take into 
> account the actual distribution (number of documents) in each hash bucket 
> created by using compositeId hashing.
> We should probably add a parameter *splitByPrefix* to the *SPLITSHARD* 
> command that would look at the number of docs sharing each compositeId prefix 
> and use that to create roughly equal sized buckets by document count rather 
> than just assuming an equal distribution across the entire hash range.
> Like normal shard splitting, we should bias against splitting within hash 
> buckets unless necessary (since that leads to larger query fanout.) . Perhaps 
> this warrants a parameter that would control how much of a size mismatch is 
> tolerable before resorting to splitting within a bucket. 
> *allowedSizeDifference*?
> To more quickly calculate the number of docs in each bucket, we could index 
> the prefix in a different field.  Iterating over the terms for this field 
> would quickly give us the number of docs in each (i.e lucene keeps track of 
> the doc count for each term already.)  Perhaps the implementation could be a 
> flag on the *id* field... something like *indexPrefixes* and poly-fields that 
> would cause the indexing to be automatically done and alleviate having to 
> pass in an additional field during indexing and during the call to 
> *SPLITSHARD*.  This whole part is an optimization though and could be split 
> off into its own issue if desired.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13399) compositeId support for shard splitting

Reply via email to