[GitHub] [druid] clintropolis commented on a change in pull request #11123: Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion

GitBox Fri, 16 Apr 2021 16:58:55 -0700


clintropolis commented on a change in pull request #11123:
URL: https://github.com/apache/druid/pull/11123#discussion_r615169403




##########
File path: 
server/src/main/java/org/apache/druid/segment/realtime/appenderator/AppenderatorImpl.java
##########
@@ -849,13 +851,24 @@ private DataSegment mergeAndPush(
           // semantics.
           () -> dataSegmentPusher.push(
               mergedFile,
-              
sink.getSegment().withDimensions(IndexMerger.getMergedDimensionsFromQueryableIndexes(indexes,
 schema.getDimensionsSpec())),
+              sink.getSegment()
+                  
.withDimensions(IndexMerger.getMergedDimensionsFromQueryableIndexes(
+                      indexes,
+                      schema.getDimensionsSpec()
+                  )),
               useUniquePath
           ),
           exception -> exception instanceof Exception,
           5
       );
 
+      // Drop the queriable indexes  behind the hydrants... they are not 
needed anymore and their
+      // mapped file references
+      // can generate OOMs during merge if enough of them are held back...
+      for (FireHydrant fireHydrant : sink) {
+        fireHydrant.swapSegment(null);
+      }

Review comment:
       I'm not sure this is true, I think realtime queries still use the 
pre-merge intermediary segments and we do not do any sort of swap and replace 
to the merged segment

##########
File path: 
processing/src/main/java/org/apache/druid/segment/QueryableIndexSegment.java
##########
@@ -19,24 +19,58 @@
 
 package org.apache.druid.segment;
 
+import com.google.common.base.Supplier;
+import com.google.common.base.Suppliers;
 import org.apache.druid.timeline.SegmentId;
 import org.joda.time.Interval;
 
 /**
+ *
  */
 public class QueryableIndexSegment implements Segment
 {
-  private final QueryableIndex index;
-  private final QueryableIndexStorageAdapter storageAdapter;
+  private final Supplier<QueryableIndex> indexSupplier;
+  private final Supplier<QueryableIndexStorageAdapter> 
queryableIndexStorageAdapterSupplier;

Review comment:
       I wonder if there is a way to limit the supplier to being part of the 
`FireHydrant` instead of doing this modification here?
   
   While I can't _think_ of any ill side-effect that this might cause since 
these suppliers shouldn't be called too many times in the scheme of things, 
this change also has huge surface area because of it happening here instead of 
being limited to ingestion.

##########
File path: 
server/src/main/java/org/apache/druid/segment/realtime/appenderator/AppenderatorImpl.java
##########
@@ -347,7 +348,8 @@ public AppenderatorAddResult add(
           }
         }
 
-        if (!skipBytesInMemoryOverheadCheck && bytesCurrentlyInMemory.get() - 
bytesToBePersisted > maxBytesTuningConfig) {
+        if (!skipBytesInMemoryOverheadCheck

Review comment:
       Hmm, I think the memory overhead calculation checks are going to end up 
triggering the supplier which will map the segment? Specifically 
`calculateMMappedHydrantMemoryInUsed`, which is going to try to get the storage 
adapter to count the number of columns. I think `FireHydrant` is going to need 
some way to track whether or not the segment has been mapped (related to my 
earlier thread on why the supplier might also be more suitable to live in 
`FireHydrant` somehow, so that we can have some side effect to let the hydrant 
know the mapping has happened)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] clintropolis commented on a change in pull request #11123: Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion

Reply via email to