imply-cheddar commented on code in PR #12392:
URL: https://github.com/apache/druid/pull/12392#discussion_r846921111
##########
core/src/main/java/org/apache/druid/data/input/InputEntity.java:
##########
@@ -136,4 +136,13 @@ public void close()
{
return Predicates.alwaysFalse();
}
+
+ /**
+ * This is required so that an empty iterator is created for the reader
corresponding to a tombstone
+ * @return false if the entity does not come from a tombstone segment, true
otherwise
+ */
+ default boolean isFromTombstone()
Review Comment:
Please do not add this to the interface. `DruidSegmentReader` and
`DruidSegmentInputFormat` pretty much only know how to deal with
DruidInputEntity objects anyway (it's gonna have a really hard time reading an
ORC based one, for example). As such, the users can just do an `instanceof`
check and then call the concrete method directly.
##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java:
##########
@@ -550,16 +552,18 @@ private String createIndexTaskSpecId(int i)
segmentProvider,
lockGranularityInUse
);
+
final Map<DataSegment, File> segmentFileMap = pair.lhs;
final List<TimelineObjectHolder<String, DataSegment>> timelineSegments =
pair.rhs;
if (timelineSegments.size() == 0) {
return Collections.emptyList();
}
- // find metadata for interval
+ // find metadata for intervals with real data segments
// queryableIndexAndSegments is sorted by the interval of the dataSegment
- final List<NonnullPair<QueryableIndex, DataSegment>>
queryableIndexAndSegments = loadSegments(
+ // Note that this list will contain null QueriableIndex values for
tombstones
+ final List<Pair<QueryableIndex, DataSegment>> queryableIndexAndSegments =
loadSegments(
Review Comment:
I haven't checked all of the call sites, so just asking, but given that from
looking at this diff, it appears as those you adjusted all of the places that
know about this list to start filtering out non-null values, what's the
difference between just filtering them out here?
It would've been a lot nicer if there were just noop versions of these
classes that could be used so that this logic doesn't have to change at all,
but without doing that, filtering seems to be necessary. But, why filter
locally in all of the places instead of just once and reuse?
##########
indexing-service/src/main/java/org/apache/druid/indexing/input/DruidSegmentReader.java:
##########
@@ -105,6 +105,29 @@
@Override
protected CloseableIterator<Map<String, Object>> intermediateRowIterator()
throws IOException
{
+ if (source.isFromTombstone()) {
Review Comment:
Maybe don't do this here? `DruidSegmentInputFormat` can check if the
`InputEntity` is a tombstone and return an `DruidTombstoneSegmentReader` which
only knows how to return empty stuff
##########
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java:
##########
@@ -426,6 +426,17 @@ private CoordinatorStats doRun(
if (config.getIoConfig() != null) {
dropExisting = config.getIoConfig().isDropExisting();
}
+ // force dropExisting to true if dropExisting is not set,
+ // dropExistingis false,
+ // or all segments to compact are tombstones.
+ // Otherwise, auto-compaction will cycle indefinitely since segments
won't be compacted and their
+ // compaction state won't be set
Review Comment:
This comment is basically documentation but I fear that it is missing enough
words to fully explain why we are forcing this setting even if the cluster
wasn't configured to do this. To someone without context, I fear they will get
lost in the "if drop Existing is not set, dropExisting is false, all segments
are tombstones" wondering what that means. Maybe re-write as prose explaining
the situation that this is attempting to resolve. Something like
> if drop Existing is not set, dropExisting is false, all segments are
tombstones, then it is possible for auto-compaction to cycle indefinitely
because XYZ. Thus, we force dropExisting to be true to work around this.
Then also throw in an extra note that this is working around a limitation in
compaction and that we should look at how to make compaction be able to make
progress even in cases like the one described here and when/if that is done,
this workaround could be revisited.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]