[GitHub] [druid] loquisgon commented on a change in pull request #12137: Batch ingestion replace

GitBox Fri, 11 Feb 2022 17:15:41 -0800


loquisgon commented on a change in pull request #12137:
URL: https://github.com/apache/druid/pull/12137#discussion_r805094989




##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##########
@@ -941,9 +939,37 @@ private TaskStatus generateAndPublishSegments(
               ingestionSchema
           );
 
+      Set<DataSegment> tombStones = Collections.emptySet();
+      if (ingestionSchema.getIOConfig().isDropExisting()) {
+        TombstoneHelper tombstoneHelper = new 
TombstoneHelper(pushed.getSegments(),
+                                                              
ingestionSchema.getDataSchema(),
+                                                              
toolbox.getTaskActionClient());
+
+        List<Interval> tombstoneIntervals = 
tombstoneHelper.computeTombstoneIntervals();
+        // now find the versions for the tombstone intervals
+        Map<Interval, String> tombstonesAndVersions = new HashMap<>();
+        for (Interval interval : tombstoneIntervals) {
+          NonnullPair<Interval, String> intervalAndVersion =
+              findIntervalAndVersion(

Review comment:
       @imply-cheddar I have updated the way allocation is done. I decided to 
use the "natural" path to use the allocators. That is I integrated the creation 
of tombstones in each of the paths for that different partition schemes: 
dynamic (linear), hash and range. The idea that I used is to create the 
tombstones at the "end" of processing of each one of those paths. At the "end" 
we know the actual pushed segments and the input intervals (these are now 
required for `dropExisting` `true`) thus we can compute and allocate the 
tombstones. I chose the right place where the allocator is available to create 
the tombstones. In general each one of the sub-tasks creates its tombstones. 
This means that some tombstones will not be kept because some of the other 
sub-tasks (of the same parallel task) could have created a real segment in that 
interval. Thus, at the "end" when all segments are combined from the subtasks 
there is a process that removes those "redundant" tomsbtones. I added unit tests
  to the corresponding. parallel task test and tested this manually as well in 
a local server.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] loquisgon commented on a change in pull request #12137: Batch ingestion replace

Reply via email to