maytasm opened a new pull request, #19141:
URL: https://github.com/apache/druid/pull/19141
Add spill file count limit for GroupBy query
### Description
GroupBy queries that group on high-cardinality dimensions can create a large
number of spill files. This problem is more likely when queries contain many
aggregators and/or aggregators with large memory footprints (e.g., DataSketch).
This is because GroupBy can only hold a limited number of unique groupings in
memory before flushing to disk — the exact limit depends on the size of each
row, which is determined by the size of the aggregators. The issue arises when
GroupBy attempts to merge all the spill files. Currently, GroupBy merges spill
files by opening all of them simultaneously. Opening these files requires
memory for objects such as MappingIterator, SmileParser, etc., which can cause
historical nodes to OOM.
This PR fixes the issue by introducing a new property:
`druid.query.groupBy.maxSpillFileCount`
The maximum number of spill files allowed per GroupBy query. When the limit
is reached, the query fails with a ResourceLimitExceededException. This
property can be used to prevent historical nodes from OOMing due to an
excessive number of spill files being opened simultaneously during the merge
phase. Defaults to Integer.MAX_VALUE (unlimited). Can also be set per query via
the query context key `maxSpillFileCount`.
---
Release Notes
- Added a new GroupBy query configuration property
druid.query.groupBy.maxSpillFileCount to limit the maximum number of spill
files created per query. When the limit is exceeded, the query fails with a
clear error message instead of causing historical nodes to OOM during spill
file merging. The limit can also be overridden per query via the query context
`maxSpillFileCount`.
##### Key changed/added classes in this PR
*
`processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/LimitedTemporaryStorage.java`
*
`processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/SpillingGrouper.java`
This PR has:
- [x] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [x] a release note entry in the PR description.
- [x] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]