[GitHub] [druid] loquisgon commented on a change in pull request #10742: Granularity interval materialization

GitBox Thu, 21 Jan 2021 22:40:49 -0800


loquisgon commented on a change in pull request #10742:
URL: https://github.com/apache/druid/pull/10742#discussion_r562027501




##########
File path: 
server/src/test/java/org/apache/druid/segment/indexing/granularity/IntervalsByGranularityTest.java
##########
@@ -76,6 +79,21 @@ public void testDups()
     Assert.assertTrue(count == 61);
   }
 
+
+  @Test
+  public void testCondenseDoesNotMaterialize()

Review comment:
       It only does indirectly, if it materialized it would create 1 year worth 
of seconds amount of intervals (over 31 million). If it materialized I reasoned 
that it would slow down the test or something. I can remove it or rename it. 
Let me know if you think of a better way to test that. I don't want to measure 
memory before and after using the `Runtime.getRuntime()` methods because they 
are not very precise either.

##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialDimensionDistributionTask.java
##########
@@ -106,7 +106,8 @@
     );
   }
 
-  @VisibleForTesting  // Only for testing
+  @VisibleForTesting
+    // Only for testing

Review comment:
       Ok... that comment has been there for a long time... the only reason it 
shows as my change is because of automatic formatting from Intelli-j. I will 
remove the message in next commit.

##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractBatchIndexTask.java
##########
@@ -373,23 +373,25 @@ private boolean 
tryLockWithDetermineResult(TaskActionClient client, LockGranular
     }
   }
 
+
   protected boolean tryTimeChunkLock(TaskActionClient client, List<Interval> 
intervals) throws IOException
   {
-
-    // In this case, the intervals to lock must be aligned with 
segmentGranularity if it's defined
-    final Iterator<Interval> intervalIterator;
+    // The given intervals are first converted to align with segment 
granularity. This is because,
+    // when an overwriting task finds a version for a given input row, it 
expects the interval
+    // associated to each version to be equal or larger than the time bucket 
where the input row falls in.
+    // See ParallelIndexSupervisorTask.findVersion().
+    final Set<Interval> uniqueCondensedIntervals = new HashSet<>();
     final Granularity segmentGranularity = getSegmentGranularity();
     if (segmentGranularity == null) {
-      Set<Interval> uniqueIntervals = new HashSet<>(intervals);
-      ArrayList<Interval> condensedIntervals = JodaUtils.condenseIntervals(() 
-> uniqueIntervals.iterator());
-      intervalIterator = condensedIntervals.iterator();
+      uniqueCondensedIntervals.addAll(JodaUtils.condenseIntervals(intervals));
     } else {
       IntervalsByGranularity intervalsByGranularity = new 
IntervalsByGranularity(intervals, segmentGranularity);
-      intervalIterator = intervalsByGranularity.granularityIntervalsIterator();
+      // the following is calling a condense that does not materialize the 
intervals:
+      
uniqueCondensedIntervals.addAll(JodaUtils.condenseIntervals(intervalsByGranularity.granularityIntervalsIterator()));

Review comment:
       Yes, `addAll` is using the materialized intervals being returned by the 
`condenseIntervals` call. The assumption here is that the condensed intervals 
are much fewer than the bucket intervals. I think this assumption will hold in 
the majority of real cases. I think that in a rare corner case where the 
original intervals cannot be condensed then we still would have issues, but not 
in the majority of production cases.

##########
File path: core/src/main/java/org/apache/druid/java/util/common/JodaUtils.java
##########
@@ -50,15 +49,25 @@
         sortedIntervals.add(interval);
       }
     }
+    return condenseIntervals(sortedIntervals.iterator());
+  }
 
-    if (sortedIntervals.isEmpty()) {
-      return new ArrayList<>();
+  /**
+   * Caller needs to insure that sortedIntervals is sorted in ascending order

Review comment:
       Fixed in next commit.

##########
File path: 
server/src/test/java/org/apache/druid/segment/indexing/granularity/IntervalsByGranularityTest.java
##########
@@ -76,6 +79,21 @@ public void testDups()
     Assert.assertTrue(count == 61);
   }
 
+
+  @Test
+  public void testCondenseDoesNotMaterialize()

Review comment:
       I changed the name of the method and added a comment inside the method 
to indicate intention..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] loquisgon commented on a change in pull request #10742: Granularity interval materialization

Reply via email to