loquisgon commented on a change in pull request #10742: URL: https://github.com/apache/druid/pull/10742#discussion_r562027501
########## File path: server/src/test/java/org/apache/druid/segment/indexing/granularity/IntervalsByGranularityTest.java ########## @@ -76,6 +79,21 @@ public void testDups() Assert.assertTrue(count == 61); } + + @Test + public void testCondenseDoesNotMaterialize() Review comment: It only does indirectly, if it materialized it would create 1 year worth of seconds amount of intervals (over 31 million). If it materialized I reasoned that it would slow down the test or something. I can remove it or rename it. Let me know if you think of a better way to test that. I don't want to measure memory before and after using the `Runtime.getRuntime()` methods because they are not very precise either. ########## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialDimensionDistributionTask.java ########## @@ -106,7 +106,8 @@ ); } - @VisibleForTesting // Only for testing + @VisibleForTesting + // Only for testing Review comment: Ok... that comment has been there for a long time... the only reason it shows as my change is because of automatic formatting from Intelli-j. I will remove the message in next commit. ########## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractBatchIndexTask.java ########## @@ -373,23 +373,25 @@ private boolean tryLockWithDetermineResult(TaskActionClient client, LockGranular } } + protected boolean tryTimeChunkLock(TaskActionClient client, List<Interval> intervals) throws IOException { - - // In this case, the intervals to lock must be aligned with segmentGranularity if it's defined - final Iterator<Interval> intervalIterator; + // The given intervals are first converted to align with segment granularity. This is because, + // when an overwriting task finds a version for a given input row, it expects the interval + // associated to each version to be equal or larger than the time bucket where the input row falls in. + // See ParallelIndexSupervisorTask.findVersion(). + final Set<Interval> uniqueCondensedIntervals = new HashSet<>(); final Granularity segmentGranularity = getSegmentGranularity(); if (segmentGranularity == null) { - Set<Interval> uniqueIntervals = new HashSet<>(intervals); - ArrayList<Interval> condensedIntervals = JodaUtils.condenseIntervals(() -> uniqueIntervals.iterator()); - intervalIterator = condensedIntervals.iterator(); + uniqueCondensedIntervals.addAll(JodaUtils.condenseIntervals(intervals)); } else { IntervalsByGranularity intervalsByGranularity = new IntervalsByGranularity(intervals, segmentGranularity); - intervalIterator = intervalsByGranularity.granularityIntervalsIterator(); + // the following is calling a condense that does not materialize the intervals: + uniqueCondensedIntervals.addAll(JodaUtils.condenseIntervals(intervalsByGranularity.granularityIntervalsIterator())); Review comment: Yes, `addAll` is using the materialized intervals being returned by the `condenseIntervals` call. The assumption here is that the condensed intervals are much fewer than the bucket intervals. I think this assumption will hold in the majority of real cases. I think that in a rare corner case where the original intervals cannot be condensed then we still would have issues, but not in the majority of production cases. ########## File path: core/src/main/java/org/apache/druid/java/util/common/JodaUtils.java ########## @@ -50,15 +49,25 @@ sortedIntervals.add(interval); } } + return condenseIntervals(sortedIntervals.iterator()); + } - if (sortedIntervals.isEmpty()) { - return new ArrayList<>(); + /** + * Caller needs to insure that sortedIntervals is sorted in ascending order Review comment: Fixed in next commit. ########## File path: server/src/test/java/org/apache/druid/segment/indexing/granularity/IntervalsByGranularityTest.java ########## @@ -76,6 +79,21 @@ public void testDups() Assert.assertTrue(count == 61); } + + @Test + public void testCondenseDoesNotMaterialize() Review comment: I changed the name of the method and added a comment inside the method to indicate intention.. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org