jihoonson commented on a change in pull request #10592:
URL: https://github.com/apache/druid/pull/10592#discussion_r526366690



##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
##########
@@ -541,9 +561,12 @@ private TaskStatus 
runHashPartitionMultiPhaseParallel(TaskToolbox toolbox) throw
       );
     }
 
-    final Integer numShardsOverride;
+    final Map<Interval, Integer> intervalToNumShards;
     HashedPartitionsSpec partitionsSpec = (HashedPartitionsSpec) 
ingestionSchema.getTuningConfig().getPartitionsSpec();
-    if (partitionsSpec.getNumShards() == null) {
+    final boolean needsInputSampling =
+        partitionsSpec.getNumShards() == null
+        || 
ingestionSchemaToUse.getDataSchema().getGranularitySpec().inputIntervals().isEmpty();
+    if (needsInputSampling) {
       // 0. need to determine numShards by scanning the data
       LOG.info("numShards is unspecified, beginning %s phase.", 
PartialDimensionCardinalityTask.TYPE);

Review comment:
       :+1: 

##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java
##########
@@ -552,38 +575,50 @@ private TaskStatus 
runHashPartitionMultiPhaseParallel(TaskToolbox toolbox) throw
               this::createPartialDimensionCardinalityRunner
           );
 
-      if (cardinalityRunner == null) {
-        throw new ISE("Could not create cardinality runner for hash 
partitioning.");
-      }
-
       state = runNextPhase(cardinalityRunner);
       if (state.isFailure()) {
         return TaskStatus.failure(getId());
       }
 
-      int effectiveMaxRowsPerSegment = partitionsSpec.getMaxRowsPerSegment() 
== null
-                                       ? 
PartitionsSpec.DEFAULT_MAX_ROWS_PER_SEGMENT
-                                       : partitionsSpec.getMaxRowsPerSegment();
-      LOG.info("effective maxRowsPerSegment is: " + 
effectiveMaxRowsPerSegment);
+      if (cardinalityRunner.getReports().isEmpty()) {
+        String msg = "No valid rows for hash partitioning."
+                     + " All rows may have invalid timestamps or have been 
filtered out.";
+        LOG.warn(msg);
+        return TaskStatus.success(getId(), msg);
+      }
+
+      if (partitionsSpec.getNumShards() == null) {
+        int effectiveMaxRowsPerSegment = partitionsSpec.getMaxRowsPerSegment() 
== null
+                                         ? 
PartitionsSpec.DEFAULT_MAX_ROWS_PER_SEGMENT
+                                         : 
partitionsSpec.getMaxRowsPerSegment();
+        LOG.info("effective maxRowsPerSegment is: " + 
effectiveMaxRowsPerSegment);
 
-      if (cardinalityRunner.getReports() == null) {
-        throw new ISE("Could not determine cardinalities for hash 
partitioning.");
+        intervalToNumShards = determineNumShardsFromCardinalityReport(
+            cardinalityRunner.getReports().values(),
+            effectiveMaxRowsPerSegment
+        );
+      } else {
+        intervalToNumShards = CollectionUtils.mapValues(
+            mergeCardinalityReports(cardinalityRunner.getReports().values()),
+            k -> partitionsSpec.getNumShards()
+        );
       }
-      numShardsOverride = determineNumShardsFromCardinalityReport(
-          cardinalityRunner.getReports().values(),
-          effectiveMaxRowsPerSegment
-      );
 
-      LOG.info("Automatically determined numShards: " + numShardsOverride);

Review comment:
       No, I didn't add one for `intervalToNumShards` because it could have 
lots of intervals.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to