[GitHub] [druid] imply-cheddar commented on a diff in pull request #12392: Make tombstones ingestible by having them return an empty result set.

GitBox Sun, 10 Apr 2022 20:50:15 -0700


imply-cheddar commented on code in PR #12392:
URL: https://github.com/apache/druid/pull/12392#discussion_r846921111



##########
core/src/main/java/org/apache/druid/data/input/InputEntity.java:
##########
@@ -136,4 +136,13 @@ public void close()
   {
     return Predicates.alwaysFalse();
   }
+
+  /**
+   * This is required so that an empty iterator is created for the reader 
corresponding to a tombstone
+   * @return false if the entity does not come from a tombstone segment, true 
otherwise
+   */
+  default boolean isFromTombstone()

Review Comment:
   Please do not add this to the interface.  `DruidSegmentReader` and 
`DruidSegmentInputFormat` pretty much only know how to deal with 
DruidInputEntity objects anyway (it's gonna have a really hard time reading an 
ORC based one, for example).  As such, the users can just do an `instanceof` 
check and then call the concrete method directly.



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java:
##########
@@ -550,16 +552,18 @@ private String createIndexTaskSpecId(int i)
         segmentProvider,
         lockGranularityInUse
     );
+
     final Map<DataSegment, File> segmentFileMap = pair.lhs;
     final List<TimelineObjectHolder<String, DataSegment>> timelineSegments = 
pair.rhs;
 
     if (timelineSegments.size() == 0) {
       return Collections.emptyList();
     }
 
-    // find metadata for interval
+    // find metadata for intervals with real data segments
     // queryableIndexAndSegments is sorted by the interval of the dataSegment
-    final List<NonnullPair<QueryableIndex, DataSegment>> 
queryableIndexAndSegments = loadSegments(
+    // Note that this list will contain null QueriableIndex values for 
tombstones
+    final List<Pair<QueryableIndex, DataSegment>> queryableIndexAndSegments = 
loadSegments(

Review Comment:
   I haven't checked all of the call sites, so just asking, but given that from 
looking at this diff, it appears as those you adjusted all of the places that 
know about this list to start filtering out non-null values, what's the 
difference between just filtering them out here?
   
   It would've been a lot nicer if there were just noop versions of these 
classes that could be used so that this logic doesn't have to change at all, 
but without doing that, filtering seems to be necessary.  But, why filter 
locally in all of the places instead of just once and reuse?



##########
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java:
##########
@@ -105,6 +105,29 @@
   @Override
   protected CloseableIterator<Map<String, Object>> intermediateRowIterator() 
throws IOException
   {
+    if (source.isFromTombstone()) {

Review Comment:
   Maybe don't do this here?  `DruidSegmentInputFormat` can check if the 
`InputEntity` is a tombstone and return an `DruidTombstoneSegmentReader` which 
only knows how to return empty stuff



##########
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java:
##########
@@ -426,6 +426,17 @@ private CoordinatorStats doRun(
         if (config.getIoConfig() != null) {
           dropExisting = config.getIoConfig().isDropExisting();
         }
+        // force dropExisting to true if dropExisting is not set,
+        // dropExistingis false,
+        // or all segments to compact are tombstones.
+        // Otherwise, auto-compaction will cycle indefinitely since segments 
won't be compacted and their
+        // compaction state won't be set

Review Comment:
   This comment is basically documentation but I fear that it is missing enough 
words to fully explain why we are forcing this setting even if the cluster 
wasn't configured to do this.  To someone without context, I fear they will get 
lost in the "if drop Existing is not set, dropExisting is false, all segments 
are tombstones" wondering what that means.   Maybe re-write as prose explaining 
the situation that this is attempting to resolve.  Something like
   
   > if drop Existing is not set, dropExisting is false, all segments are 
tombstones, then it is possible for auto-compaction to cycle indefinitely 
because XYZ.  Thus, we force dropExisting to be true to work around this.
   
   Then also throw in an extra note that this is working around a limitation in 
compaction and that we should look at how to make compaction be able to make 
progress even in cases like the one described here and when/if that is done, 
this workaround could be revisited.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] imply-cheddar commented on a diff in pull request #12392: Make tombstones ingestible by having them return an empty result set.

Reply via email to