GWphua opened a new pull request, #18731:
URL: https://github.com/apache/druid/pull/18731

   <!-- Thanks for trying to help us make Apache Druid be the best it can be! 
Please fill out as much of the following information as is possible (where 
relevant, and remove it when irrelevant) to help make the intention and scope 
of this PR clear in order to ease review. -->
   
   <!-- Please read the doc for contribution 
(https://github.com/apache/druid/blob/master/CONTRIBUTING.md) before making 
this PR. Also, once you open a PR, please _avoid using force pushes and 
rebasing_ since these make it difficult for reviewers to see what you've 
changed in response to their reviews. See [the 'If your pull request shows 
conflicts with master' 
section](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master)
 for more details. -->
   
   Fixes #17902
   
   <!-- Replace XXXX with the id of the issue fixed in this PR. Remove this 
section if there is no corresponding issue. Don't reference the issue in the 
title of this pull-request. -->
   
   <!-- If you are a committer, follow the PR action item checklist for 
committers:
   
https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers.
 -->
   
   ### Description
   
   <!-- Describe the goal of this PR, what problem are you fixing. If there is 
a corresponding issue (referenced above), it's not necessary to repeat the 
description here, however, you may choose to keep one summary sentence. -->
   
   <!-- Describe your patch: what did you change in code? How did you fix the 
problem? -->
   
   <!-- If there are several relatively logically separate changes in this PR, 
create a mini-section for each of them. For example: -->
   
   #### Tracking merge buffer usage
   - Usage of a direct byte buffer is done under `AbstractBufferHashGrouper` 
and its implementations.
   1. Each direct byte buffer uses a `ByteBufferHashTable` along with an offset 
tracker.
   2. Usage is calculated by tracking the maximum capacity of the byte buffer 
in `ByteBufferHashTable`, and maximum offset size calculated throughout the 
query's lifecycle. 
   
   Calculations are done by taking the maximum throughout the query, so 
operators can better understand how big the merge buffers can be configured.
   
   <!--
   In each section, please describe design decisions made, including:
    - Choice of algorithms
    - Behavioral aspects. What configuration values are acceptable? How are 
corner cases and error conditions handled, such as when there are insufficient 
resources?
    - Class organization and design (how the logic is split between classes, 
inheritance, composition, design patterns)
    - Method organization and design (how the logic is split between methods, 
parameters and return types)
    - Naming (class, method, API, configuration, HTTP endpoint, names of 
emitted metrics)
   -->
   
   #### Release note
   <!-- Give your best effort to summarize your changes in a couple of 
sentences aimed toward Druid users. 
   
   If your change doesn't have end user impact, you can skip this section.
   
   For tips about how to write a good release note, see [Release 
notes](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#release-notes).
   
   -->
   
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `GroupByStatsProvider`
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not 
all of these items apply to every PR. Remove the items which are not done or 
not relevant to the PR. None of the items from the checklist below are strictly 
necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] a release note entry in the PR description.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.
   
   ### Future Plans
   While building this PR, I have come across some further enhancements that 
can be introduced in the future: 
   
   #### Adding per-query-stats to query logs
   Adding per-query information to query logs makes sense to me.
   
   #### Per-buffer metrics
   The current metric is great, but will not report accurately for nested 
group-by's. As far as I know, nested groupby limits the merge buffers usage 
count to 2, meaning the merge buffer will be re-used, and a per-query metric 
will likely over-report the merge buffer usage.
   
   I feel it is nice to report per-buffer usage, instead of per-query usage.
   
   #### Simplify Memory Management
   Right now we need to configure the following for each queryable service:
   1. size of merge buffer
   2. number of merge buffer
   3. direct memory = (numProcessingThreads + numMergeBuffer + 1) * 
mergeBufferSizeBytes
   
   It will be great if we can simplify the calculations down to simply 
configuring direct memory, and we can manage a memory pool instead. This allows 
for more flexibility (unused memory allocated for merge buffers may be used by 
processing threads instead).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to