alamb opened a new issue, #11680: URL: https://github.com/apache/datafusion/issues/11680
### Is your feature request related to a problem or challenge? As described on https://github.com/apache/datafusion/issues/11679, we can do better for high cardinality aggregates One thing that consumes significant time in such queries is hashing, and I think we can reduce that significantly. Specifically, for the multi-phase repartition plan, the number of hashed rows is something like ``` (input cardinality) + 2 * (intermediate group cardinality) * (number of partitions) ``` For low cardinality aggregates (e.g when the intermediate group cardinality is 1000) the second term is small (a few thousand extra hashes isn't a big deal) However, for high cardinality aggregates (eg. when the intermediate cardinality is like 1,000,000 and there are 16 partitions) the second term is substantial In pictures, this looks like ``` ▲ ▲ │ │ │ │ │ │ │ │ │ │ ┌───────────────────────┐ ┌───────────────────────┐ 4. The AggregateMode::Final │GroupBy │ │GroupBy │ GroupBy computes hash(group keys) │(AggregateMode::Final) │ │(AggregateMode::Final) │ *AGAIN* to find the correct hash │ │ │ │ bucket └───────────────────────┘ └───────────────────────┘ ▲ ▲ │ │ └─────────────┬────────────┘ │ │ │ ┌─────────────────────────┐ 3. The output of the first phase │ Repartition │ is repartitioned by computing │ HASH(x) │ hash(group keys) -- this is the └─────────────────────────┘ same hash as computed in step 2. ▲ │ ┌───────────────┴─────────────┐ │ │ │ │ ┌─────────────────────────┐ ┌──────────────────────────┐ 2. Each AggregateMode::Partial │ GroubyBy │ │ GroubyBy │ GroupBy hashes the group keys to │(AggregateMode::Partial) │ │ (AggregateMode::Partial) │ find the correct hash bucket. └─────────────────────────┘ └──────────────────────────┘ ▲ ▲ │ ┌┘ │ │ .─────────. .─────────. ,─' '─. ,─' '─. ; Input : ; Input : 1. Input is read : Partition 0 ; : Partition 1 ; ╲ ╱ ╲ ╱ '─. ,─' '─. ,─' `───────' `───────' ``` This effect can be seen in profiling for click bench XXX: (TODO screen shot showing large amounts of work done in hashing) ### Describe the solution you'd like The basic idea is rather than rather than recompute the hash values in `RepartitionExec` and `AggregateMode::Final` we would reuse the values from `AggregateMode::Partial` (which has already computed a hash value for each input group) Something like this ```text ▲ ▲ │ │ │ │ │ │ │ │ │ │ ┌───────────────────────┐ ┌───────────────────────┐ 4. The AggregateMode::Final │GroupBy │ │GroupBy │ GroupBy also gets the hash values │(AggregateMode::Final) │ │(AggregateMode::Final) │ and does not recompute them │ │ │ │ └───────────────────────┘ └───────────────────────┘ ▲ ▲ ▲ │ │ │ └─────────────┬────────────┘ Pass hash │ │ values up the │ plan rather │ │ than ┌─────────────────────────┐ 3. In addition to the partial recomputing │ │ Repartition │ aggregates and group values, *ALSO* them │ PRECOMPUTED_HASH │ pass the hash values to the │ └─────────────────────────┘ RepartitionExec which also passed ▲ them on to the AggregateMode::Final │ │ ┌───────────────┴─────────────┐ │ │ │ │ │ ┌─────────────────────────┐ ┌──────────────────────────┐ 2. Each AggregateMode::Partial │ GroubyBy │ │ GroubyBy │ GroupBy hashes the group keys to │(AggregateMode::Partial) │ │ (AggregateMode::Partial) │ find the correct hash bucket. └─────────────────────────┘ └──────────────────────────┘ ▲ ▲ │ ┌┘ │ │ .─────────. .─────────. ,─' '─. ,─' '─. ; Input : ; Input : 1. Input is read : Partition 0 ; : Partition 1 ; ╲ ╱ ╲ ╱ '─. ,─' '─. ,─' `───────' `───────' ``` ### Describe alternatives you've considered We maybe could pass the data as an explicit new column somehow, or maybe as a field in a struct array 🤔 ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org