ilooner commented on a change in pull request #1409: DRILL-6644: Don't reserve 
space for incoming probe batches unnecessarily during the build phase.
URL: https://github.com/apache/drill/pull/1409#discussion_r208450659
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java
 ##########
 @@ -387,9 +387,12 @@ private void calculateMemoryUsage()
         long incompletePartitionsBatchSizes = ((long) partitions) * 
partitionBuildBatchSize;
         // We need to reserve all the space for incomplete batches, and the 
incoming batch as well as the
         // probe batch we sniffed.
-        // TODO when batch sizing project is complete we won't have to sniff 
probe batches since
-        // they will have a well defined size.
-        reservedMemory = incompletePartitionsBatchSizes + maxBuildBatchSize + 
probeSizePredictor.getBatchSize();
+        reservedMemory = incompletePartitionsBatchSizes + maxBuildBatchSize;
+
+        if (!firstCycle) {
+          // If this is NOT the first cycle the HashJoin operator owns the 
probe batch and we need to reserve space for it.
+          reservedMemory += probeSizePredictor.getBatchSize();
 
 Review comment:
    - This should work correctly with spilling. innerNext calls executeBuild 
which creates a new calculator each time. The calculator is initialized with 
the current build and probe batches, so the calculator will always be 
initialized with the correct state.
    - You are right, there should be no difference between maxProbeBatchSize 
and getBatchSize()  now, since this code will only be activated when reading 
spilled batches. This is leftover logic from when the calculator would compute 
the worst case incoming probe batch size during the first cycle. I will 
consolidate these two sizes to clean things up.
    - The build side incoming batch is taken into account on line 390. *Note:* 
this code still reserves memory for the incoming build side batch during the 
first cycle, which is not correct. I want to fix that issue in a separate PR 
though, since these changes get quite large and require extensive changes to 
the unit tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to