[
https://issues.apache.org/jira/browse/SOLR-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903409#comment-16903409
]
Hoss Man commented on SOLR-13399:
---------------------------------
i would assume it's related to the (numSubShards) changes in SplitShardCmd ?
At first glance, that code path looks like it's specific to SPLIT_BY_PREFIX,
but apparently your previous commit has it defaulting to "true" ? (see
SplitShardCmd.java L212)
{noformat}
$ git show 19ddcfd282f3b9eccc50da83653674e510229960 --
core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java | cat
commit 19ddcfd282f3b9eccc50da83653674e510229960
Author: yonik <[email protected]>
Date: Tue Aug 6 14:09:54 2019 -0400
SOLR-13399: ability to use id field for compositeId histogram
diff --git
a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
index 4d623be..6c5921e 100644
---
a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
+++
b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
@@ -212,16 +212,14 @@ public class SplitShardCmd implements
OverseerCollectionMessageHandler.Cmd {
if (message.getBool(CommonAdminParams.SPLIT_BY_PREFIX, true)) {
t = timings.sub("getRanges");
- log.info("Requesting split ranges from replica " +
parentShardLeader.getName() + " as part of slice " + slice + " of collection "
- + collectionName + " on " + parentShardLeader);
-
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(CoreAdminParams.ACTION,
CoreAdminParams.CoreAdminAction.SPLIT.toString());
params.set(CoreAdminParams.GET_RANGES, "true");
params.set(CommonAdminParams.SPLIT_METHOD, splitMethod.toLower());
params.set(CoreAdminParams.CORE, parentShardLeader.getStr("core"));
- int numSubShards = message.getInt(NUM_SUB_SHARDS,
DEFAULT_NUM_SUB_SHARDS);
- params.set(NUM_SUB_SHARDS, Integer.toString(numSubShards));
+ // Only 2 is currently supported
+ // int numSubShards = message.getInt(NUM_SUB_SHARDS,
DEFAULT_NUM_SUB_SHARDS);
+ // params.set(NUM_SUB_SHARDS, Integer.toString(numSubShards));
{
final ShardRequestTracker shardRequestTracker =
ocmh.asyncRequestTracker(asyncId);
@@ -236,7 +234,7 @@ public class SplitShardCmd implements
OverseerCollectionMessageHandler.Cmd {
NamedList shardRsp = (NamedList)successes.getVal(0);
String splits = (String)shardRsp.get(CoreAdminParams.RANGES);
if (splits != null) {
- log.info("Resulting split range to be used is " + splits);
+ log.info("Resulting split ranges to be used: " + splits + "
slice=" + slice + " leader=" + parentShardLeader);
// change the message to use the recommended split ranges
message = message.plus(CoreAdminParams.RANGES, splits);
}
{noformat}
(I could be totally of base though -- i don't really understand 90% of what
this test is doing, and the place where it fails doesn't seem to be trying to
split into more then 2 subshards, so even if the SplitSHardCmd changes i
pointed out are buggy, i'm not sure why it would cause this particular failure)
> compositeId support for shard splitting
> ---------------------------------------
>
> Key: SOLR-13399
> URL: https://issues.apache.org/jira/browse/SOLR-13399
> Project: Solr
> Issue Type: New Feature
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Major
> Fix For: 8.3
>
> Attachments: SOLR-13399.patch, SOLR-13399.patch,
> SOLR-13399_testfix.patch, SOLR-13399_useId.patch,
> ShardSplitTest.master.seed_AE04B5C9BA6E9A4.log.txt
>
>
> Shard splitting does not currently have a way to automatically take into
> account the actual distribution (number of documents) in each hash bucket
> created by using compositeId hashing.
> We should probably add a parameter *splitByPrefix* to the *SPLITSHARD*
> command that would look at the number of docs sharing each compositeId prefix
> and use that to create roughly equal sized buckets by document count rather
> than just assuming an equal distribution across the entire hash range.
> Like normal shard splitting, we should bias against splitting within hash
> buckets unless necessary (since that leads to larger query fanout.) . Perhaps
> this warrants a parameter that would control how much of a size mismatch is
> tolerable before resorting to splitting within a bucket.
> *allowedSizeDifference*?
> To more quickly calculate the number of docs in each bucket, we could index
> the prefix in a different field. Iterating over the terms for this field
> would quickly give us the number of docs in each (i.e lucene keeps track of
> the doc count for each term already.) Perhaps the implementation could be a
> flag on the *id* field... something like *indexPrefixes* and poly-fields that
> would cause the indexing to be automatically done and alleviate having to
> pass in an additional field during indexing and during the call to
> *SPLITSHARD*. This whole part is an optimization though and could be split
> off into its own issue if desired.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]