loquisgon commented on a change in pull request #12137:
URL: https://github.com/apache/druid/pull/12137#discussion_r805094989
##########
File path:
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##########
@@ -941,9 +939,37 @@ private TaskStatus generateAndPublishSegments(
ingestionSchema
);
+ Set<DataSegment> tombStones = Collections.emptySet();
+ if (ingestionSchema.getIOConfig().isDropExisting()) {
+ TombstoneHelper tombstoneHelper = new
TombstoneHelper(pushed.getSegments(),
+
ingestionSchema.getDataSchema(),
+
toolbox.getTaskActionClient());
+
+ List<Interval> tombstoneIntervals =
tombstoneHelper.computeTombstoneIntervals();
+ // now find the versions for the tombstone intervals
+ Map<Interval, String> tombstonesAndVersions = new HashMap<>();
+ for (Interval interval : tombstoneIntervals) {
+ NonnullPair<Interval, String> intervalAndVersion =
+ findIntervalAndVersion(
Review comment:
@imply-cheddar I have updated the way allocation is done. I decided to
use the "natural" path to use the allocators. That is I integrated the creation
of tombstones in each of the paths for that different partition schemes:
dynamic (linear), hash and range. The idea that I used is to create the
tombstones at the "end" of processing of each one of those paths. At the "end"
we know the actual pushed segments and the input intervals (these are now
required for `dropExisting` `true`) thus we can compute and allocate the
tombstones. I chose the right place where the allocator is available to create
the tombstones. In general each one of the sub-tasks creates its tombstones.
This means that some tombstones will not be kept because some of the other
sub-tasks (of the same parallel task) could have created a real segment in that
interval. Thus, at the "end" when all segments are combined from the subtasks
there is a process that removes those "redundant" tomsbtones. I added unit tests
to the corresponding. parallel task test and tested this manually as well in
a local server.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]