GWphua commented on code in PR #18731:
URL: https://github.com/apache/druid/pull/18731#discussion_r2712085454
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedBufferHashGrouper.java:
##########
@@ -571,7 +585,19 @@ public void adjustTableWhenFull()
size = numCopied;
tableBuffer = newTableBuffer;
+ updateMaxTableBufferUsedBytes();
growthCount++;
}
+
+ @Override
+ protected void updateMaxTableBufferUsedBytes()
+ {
+ long currentBufferUsedBytes = 0;
+ for (ByteBuffer buffer : subHashTableBuffers) {
+ currentBufferUsedBytes += buffer.capacity();
+ }
Review Comment:
Hello, I have added the tests for the groupers.
I did not get the same results as you, maybe because I used queries for a
smaller dataset.
How I did in my tests is to query with spill to disk enabled:
1. Set druid.processing.buffer.sizeBytes = 1GB
2. Query on a dataset. (Let's say the results for this is 100MB)
3. Set druid.processing.buffer.sizeBytes to a much smaller value ~5MB
4. Query on the same dataset, and watch the usage metrics cap at 5MB, with
spillage to disk ~95MB.
I do have to admit, I do have this phenomenon in my production where the MAX
metrics will perpetually show, say 1.5GB, in production. This value has been
staying at 1.5GB for 1 month already with little change. Maybe this is fixed by
changing reporting the buffer allocation value to buffer usage value. Will try
this out :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]