kosiew opened a new pull request, #18006:
URL: https://github.com/apache/datafusion/pull/18006
## **Which issue does this PR close?**
Closes #17897
**Summary:**
`MinMaxBytesAccumulator::update_batch` previously allocated a `locations`
buffer sized to `total_num_groups` for every batch.
As `total_num_groups` grew with each newly seen group, later batches
allocated progressively larger vectors โ causing **quadratic time and memory
behavior** in high-cardinality workloads (`MIN`/`MAX` over `Utf8` columns).
This PR eliminates that scaling issue by introducing **adaptive mode
selection** and **reused dense/sparse state management**.
---
## **Rationale for this change**
The previous implementation had an implicit assumption that each batch could
reallocate per-group state proportional to `total_num_groups`.
However, in real workloads with millions of distinct groups, this approach
made aggregation throughput degrade non-linearly.
This change re-architects the accumulator to dynamically choose between
**dense**, **dense-inline**, and **sparse-optimized** modes based on observed
workload characteristics (density, group reuse, and access pattern stability).
The goal is to:
* Maintain linear update cost per active group rather than per historical
group.
* Reuse dense scratch allocations across batches when beneficial.
* Use hash-based tracking for sparse, high-cardinality scenarios.
* Minimize first-batch cold-start overhead and unnecessary memory churn.
---
## **What changes are included in this PR?**
### ๐งฉ Core Logic
* Rewrote `update_batch_dense_impl`, `update_batch_sparse_impl`, and added
**`WorkloadMode`** and **`DenseInline`** paths to unify mode-specific handling.
* Added **adaptive heuristics** (`record_batch_stats`) that observe
per-batch group density, stability, and access monotonicity to switch modes
intelligently.
* Refactored sparse path with **`hashbrown`** for faster group lookups and
reduced memory overhead.
* Introduced **epoch-tracked mark reuse** to allow dense allocations to
persist across batches without full zeroing.
* Implemented **lazy mark allocation** to remove cold-start regressions
while preserving reuse.
### ๐ง Supporting Structures
* Added new helper types in `min_max_struct.rs` to encapsulate mode-specific
state and reuse logic.
* Unified dense and sparse state transitions under a consistent contract via
`update_batch` and `record_batch_stats`.
### ๐ Benchmark Suite
* Added a comprehensive **Criterion benchmark** file:
`datafusion/functions-aggregate/benches/min_max_bytes.rs`
covering 18 workloads, including:
* `dense reused accumulator`
* `sparse groups`
* `ultra sparse`
* `dense first batch`
* `monotonic group ids`
* `mode transition`
* `quadratic growing total groups`
* `extreme duplicates`
* and more
* Benchmarks validate adaptive mode selection under dense, sparse, mixed,
and transitional patterns.
### ๐งฑ Cargo Additions
* Added `hashbrown` dependency for efficient sparse tracking.
* Registered new Criterion bench target in `Cargo.toml`.
---
## **Are these changes tested?**
โ
**Yes, extensively tested and benchmarked.**
* Criterion benchmarks confirm elimination of the previous **O(nยฒ)** scaling
pattern.
* Unit tests and regression checks cover:
* Dense-inline mark reuse and lazy allocation.
* Sparse rehashing and stable-group reuse.
* Mode transitions (dense โ sparse) and monotonic ID sequences.
* Benchmarks show expected improvements:
* **Ultra sparse:** โ90 % runtime
* **Monotonic dense:** โ40 %
* **Multi-batch dense reused:** โ12 %
* Only minor +1โ4 % overhead on cold-start single-batch runs.
Criterion Benchmark Summary (Statistically Significant Changes)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโณโโโโโโโโโโโ
โ Benchmark โ Mean Change โ P-value โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ min bytes dense duplicate groups โ -7.87% โ 0.000000 โ
โ min bytes dense first batch โ -4.72% โ 0.000000 โ
โ min bytes dense reused accumulator โ -1.30% โ 0.040000 โ
โ min bytes extreme duplicates โ -3.70% โ 0.000000 โ
โ min bytes growing total groups โ -19.38% โ 0.000000 โ
โ min bytes large dense groups โ -2.49% โ 0.000000 โ
โ min bytes medium cardinality stable โ -1.75% โ 0.000000 โ
โ min bytes mode transition โ -23.40% โ 0.000000 โ
โ min bytes monotonic group ids โ -39.05% โ 0.000000 โ
โ min bytes multi batch large โ -39.70% โ 0.000000 โ
โ min bytes quadratic growing total groups โ -43.87% โ 0.000000 โ
โ min bytes sequential dense large allocations โ -6.03% โ 0.000000 โ
โ min bytes single batch large โ -7.83% โ 0.000000 โ
โ min bytes single batch small โ -2.78% โ 0.000000 โ
โ min bytes sparse groups โ -27.17% โ 0.000000 โ
โ min bytes ultra sparse โ -92.99% โ 0.000000 โ
โ min bytes sequential stable groups โ +2.08% โ 0.000000 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโ
---
## **Are there any user-facing changes?**
No public API or SQL-level changes.
All modifications are internal to the aggregate kernel implementation for
`MIN`/`MAX` over `Utf8` data.
User-visible behavior (query semantics, results, and DataFusion API) remains
unchanged.
Only runtime performance and memory efficiency are improved.
---
### **Summary**
This PR completes a major refactor of the `MinMaxBytesAccumulator`, turning
it into an **adaptive, mode-sensitive aggregator** that scales efficiently
across dense and sparse workloads โ validated through extensive synthetic
benchmarks and regression tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]