loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r870636575


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer> 
determineNumShardsFromCardinalityReport(
   {
     // aggregate all the sub-reports
     Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+    return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+  }
 
+  @Nonnull
+  @VisibleForTesting
+  static Map<Interval, Integer> computeIntervalToNumShards(
+      int maxRowsPerSegment,
+      Map<Interval, Union> finalCollectors
+  )
+  {
     return CollectionUtils.mapValues(
         finalCollectors,
         union -> {
           final double estimatedCardinality = union.getEstimate();
-          // determine numShards based on maxRowsPerSegment and the cardinality
-          final long estimatedNumShards = Math.round(estimatedCardinality / 
maxRowsPerSegment);
+          final long estimatedNumShards;
+          if (estimatedCardinality <= 0) {
+            // I don't think we can use the estimate in any way being 
negative, seven sounds like a nice prime number
+            // it is ok if we end up not filling them all, the ingestion code 
handles that
+            // Seven on the other hand will at least create some shards rather 
than potentially a single huge one
+            estimatedNumShards = 7L;

Review Comment:
   In order to enforce collect and fix we could also throw an ISE here so the 
context is repeatable...how does this sound instead of the guesstimate of seven 
shards? Rather than guesstimating just throw an ISE and halt. This may be too 
harsh so the warning is better I think but stop there and not try to be more 
clever. Let's think of this as some sort of fishing expedition for data to see 
if this was the original problem. There is no evidence and those of us that 
have tried have not been able to reproduce the scenario.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to