[GitHub] [arrow-datafusion] avantgardnerio commented on pull request #7192: Create a Priority Queue based Aggregation with `limit`

via GitHub Wed, 09 Aug 2023 14:52:30 -0700


avantgardnerio commented on PR #7192:
URL: 
https://github.com/apache/arrow-datafusion/pull/7192#issuecomment-1672204816


   We can see it doing the right thing now:
   
   ```
   GlobalLimitExec: skip=0, fetch=10
     SortPreservingMergeExec: [MAX(traces.timestamp_ms)@1 DESC], fetch=10
       SortExec: fetch=10, expr=[MAX(traces.timestamp_ms)@1 DESC]
         AggregateExec: mode=FinalPartitioned, gby=[trace_id@0 as trace_id], 
aggr=[MAX(traces.timestamp_ms)], lim=[10]
           CoalesceBatchesExec: target_batch_size=8192
             RepartitionExec: partitioning=Hash([trace_id@0], 10), 
input_partitions=10
               AggregateExec: mode=Partial, gby=[trace_id@0 as trace_id], 
aggr=[MAX(traces.timestamp_ms)], lim=[10]
                 MemoryExec: partitions=10, partition_sizes=[1, 1, 1, 1, 1, 1, 
1, 1, 1, 1]
   ```
   
   ```
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   got batch with 8000 rows
   emit batch with 10 rows
   
   got batch with 13 rows
   emit batch with 10 rows
   got batch with 12 rows
   emit batch with 10 rows
   got batch with 12 rows
   emit batch with 10 rows
   got batch with 14 rows
   emit batch with 10 rows
   got batch with 11 rows
   emit batch with 10 rows
   got batch with 8 rows
   emit batch with 8 rows
   got batch with 7 rows
   emit batch with 7 rows
   got batch with 11 rows
   emit batch with 10 rows
   got batch with 5 rows
   emit batch with 5 rows
   got batch with 7 rows
   emit batch with 7 rows
   ```
   
   but very slowly (debug mode is 10x, divide by 10 for release):
   
   ```
   +----------------------------------+--------------------------+
   | trace_id                         | MAX(traces.timestamp_ms) |
   +----------------------------------+--------------------------+
   | 2e09ebbb4cb110202e6ee274418eaff9 | 1690937510093            |
   | 8c46e3daa65cd6720c1763751ff99f2f | 1690937510093            |
   | e1de659ba388107b2ae1b0302d1a933d | 1690937510091            |
   | 522d35c60450ac951e320acfdde281a7 | 1690937510091            |
   | 998e424750c5cb2e92adea88577cced8 | 1690937510090            |
   | d518d3f57375dc9ef79772e7b98ad39d | 1690937510088            |
   | e6002e35635bc941cfa1c0b8e24903a5 | 1690937510088            |
   | a321a88f60f1836f0900e9f43f59f90d | 1690937510088            |
   | 8bbf8ec2eda9821d4463bcc0a760327f | 1690937510088            |
   | a998a8f6cce15226c9a927084e3b3c60 | 1690937510088            |
   +----------------------------------+--------------------------+
   Aggregated 80000 rows in 344.1415ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] avantgardnerio commented on pull request #7192: Create a Priority Queue based Aggregation with `limit`

Reply via email to