avantgardnerio commented on PR #7192: URL: https://github.com/apache/arrow-datafusion/pull/7192#issuecomment-1672072928
TLDR: with the naive, unoptimized version in place, it looks to be 2X slower according to a test with [realistic data](https://github.com/avantgardnerio/arrow-datafusion/blob/deb23674b7fc45a40d74e241b0504c5067d83751/datafusion/core/src/physical_plan/aggregates/priority_queue.rs#L399): <img width="371" alt="Screenshot 2023-08-09 at 1 58 03 PM" src="https://github.com/apache/arrow-datafusion/assets/3855243/c725e323-5082-4f8d-9197-34ff6774975e"> This is based upon the fact that currently, the normal sort is running twice or with the rule enabled 1 of each. ``` GlobalLimitExec: skip=0, fetch=10 SortPreservingMergeExec: [MAX(traces.timestamp_ms)@1 DESC], fetch=10 SortExec: fetch=10, expr=[MAX(traces.timestamp_ms)@1 DESC] AggregateExec: mode=FinalPartitioned, gby=[trace_id@0 as trace_id], aggr=[MAX(traces.timestamp_ms)], lim=[10] CoalesceBatchesExec: target_batch_size=8192 RepartitionExec: partitioning=Hash([trace_id@0], 10), input_partitions=10 AggregateExec: mode=Partial, gby=[trace_id@0 as trace_id], aggr=[MAX(traces.timestamp_ms)] RepartitionExec: partitioning=RoundRobinBatch(10), input_partitions=1 MemoryExec: partitions=1, partition_sizes=[1] ``` I'm not super worried because: 1. it just validates the concerns we all had about tree balancing and heap allocations 2. when the new rule runs twice, the second invocation should be on negligible data, so I predict it's be on back to par with the unlimited aggregation No matter what, this rule is much more memory efficient. I'll pass the limit down the tree and we'll see if I'm right and we match speed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
