abhishekrb19 commented on code in PR #18731:
URL: https://github.com/apache/druid/pull/18731#discussion_r2706872449
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedBufferHashGrouper.java:
##########
@@ -571,7 +585,19 @@ public void adjustTableWhenFull()
size = numCopied;
tableBuffer = newTableBuffer;
+ updateMaxTableBufferUsedBytes();
growthCount++;
}
+
+ @Override
+ protected void updateMaxTableBufferUsedBytes()
+ {
+ long currentBufferUsedBytes = 0;
+ for (ByteBuffer buffer : subHashTableBuffers) {
+ currentBufferUsedBytes += buffer.capacity();
+ }
Review Comment:
I think this effectively would just be `tableArenaSize` which would reflect
the allocated configured size rather than actual used size?
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/ByteBufferHashTable.java:
##########
@@ -381,6 +387,16 @@ public int getGrowthCount()
return growthCount;
}
+ protected void updateMaxTableBufferUsage()
+ {
+ maxTableBufferUsage = Math.max(maxTableBufferUsage,
tableBuffer.capacity());
Review Comment:
Thanks for the update. Please see
https://github.com/apache/druid/pull/18731/changes#r2706872449
##########
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedBufferHashGrouper.java:
##########
@@ -571,7 +585,19 @@ public void adjustTableWhenFull()
size = numCopied;
tableBuffer = newTableBuffer;
+ updateMaxTableBufferUsedBytes();
growthCount++;
}
+
+ @Override
+ protected void updateMaxTableBufferUsedBytes()
+ {
+ long currentBufferUsedBytes = 0;
+ for (ByteBuffer buffer : subHashTableBuffers) {
+ currentBufferUsedBytes += buffer.capacity();
+ }
Review Comment:
Please update `LimitedBufferHashGrouperTest`, `BufferHashGrouperTest` and
related grouper tests to validate the correctness of these implementations.
I just pulled in the latest patch locally and ran some group by queries and
noticed that the `bytesUsed` and `maxBytesUsed` were more or less what was
configured `druid.processing.buffer.sizeBytes` 🤔
I’ll try to dig into this more, but in the meantime, I’d still recommend
splitting the PR into two parts: 1. max metrics 2. the bytesUsed and
maxBytesUsed metrics
2 seems a bit more involved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]