GWphua commented on code in PR #18731:
URL: https://github.com/apache/druid/pull/18731#discussion_r2712085454


##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedBufferHashGrouper.java:
##########
@@ -571,7 +585,19 @@ public void adjustTableWhenFull()
 
       size = numCopied;
       tableBuffer = newTableBuffer;
+      updateMaxTableBufferUsedBytes();
       growthCount++;
     }
+
+    @Override
+    protected void updateMaxTableBufferUsedBytes()
+    {
+      long currentBufferUsedBytes = 0;
+      for (ByteBuffer buffer : subHashTableBuffers) {
+        currentBufferUsedBytes += buffer.capacity();
+      }

Review Comment:
   Hello, I have added the tests for the groupers.
   
   I did not get the same results as you, maybe because I used queries for a 
smaller dataset. 
   
   How I did in my tests is to query with spill to disk enabled:
   1. Set druid.processing.buffer.sizeBytes = 1GB
   2. Query on a dataset. (Let's say the results for this is 100MB)
   3. Set druid.processing.buffer.sizeBytes to a much smaller value ~5MB
   4. Query on the same dataset, and watch the usage metrics cap at 5MB, with 
spillage to disk ~95MB.
   
   I do have to admit, I do have this phenomenon in my production where the MAX 
metrics will perpetually show, say 1.5GB, in production. This value has been 
staying at 1.5GB for 1 month already with little change. Maybe this is fixed by 
changing reporting the buffer allocation value to buffer usage value. Will try 
this out :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to