loquisgon commented on code in PR #12443:
URL: https://github.com/apache/druid/pull/12443#discussion_r874022794
##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -901,13 +906,43 @@ public static Map<Interval, Integer>
determineNumShardsFromCardinalityReport(
{
// aggregate all the sub-reports
Map<Interval, Union> finalCollectors = mergeCardinalityReports(reports);
+ return computeIntervalToNumShards(maxRowsPerSegment, finalCollectors);
+ }
+ @Nonnull
+ @VisibleForTesting
+ static Map<Interval, Integer> computeIntervalToNumShards(
+ int maxRowsPerSegment,
+ Map<Interval, Union> finalCollectors
+ )
+ {
return CollectionUtils.mapValues(
finalCollectors,
union -> {
final double estimatedCardinality = union.getEstimate();
- // determine numShards based on maxRowsPerSegment and the cardinality
- final long estimatedNumShards = Math.round(estimatedCardinality /
maxRowsPerSegment);
+ final long estimatedNumShards;
+ if (estimatedCardinality <= 0) {
+ // I don't think we can use the estimate in any way being
negative, seven sounds like a nice prime number
+ // it is ok if we end up not filling them all, the ingestion code
handles that
+ // Seven on the other hand will at least create some shards rather
than potentially a single huge one
+ estimatedNumShards = 7L;
+ LOG.warn("Estimated cardinality for union of estimates is zero or
less: %.2f, setting num shards to %d",
+ estimatedCardinality, estimatedNumShards
+ );
+ } else {
+ // determine numShards based on maxRowsPerSegment and the
cardinality
+ estimatedNumShards = Math.round(estimatedCardinality /
maxRowsPerSegment);
+ }
+ LOG.debug("estimatedNumShards %d given estimated cardinality %.2f
and maxRowsPerSegment %d",
Review Comment:
It is not easy to know how frequent it would be logged without coming up
with some data models and retrieving some data to understand more the real
distributions of hash buckets in a given time chunk for a give set of
dimensions. This is one of these things were experience and/or experimentation
teaches you better IMO. So turning it on again and watching it seems like the
right thing to do this time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]