alamb commented on PR #8006:
URL: 
https://github.com/apache/arrow-datafusion/pull/8006#issuecomment-1791024449

   One of the test failures internally looks like the following
   
   The input looks like:
   ```
   2023-11-02T15:58:06.601675Z TRACE log: Optimized physical plan by 
CombinePartialFinalAggregate:
   OutputRequirementExec
     SortExec: expr=[time@1 ASC NULLS LAST]
       CoalescePartitionsExec
         ProjectionExec: expr=[cpu as iox::measurement, time@0 as time, 
(selector_last(sum_idle,time)@1).[value] as last, 
(selector_last(sum_system,time)@2).[value] as last_1]
           AggregateExec: mode=FinalPartitioned, gby=[time@0 as time], 
aggr=[selector_last(sum_idle,time), selector_last(sum_system,time)], 
ordering_mode=Sorted
             SortPreservingRepartitionExec: partitioning=Hash([time@0], 16), 
input_partitions=16, sort_exprs=time@0 ASC NULLS LAST
               AggregateExec: mode=Partial, gby=[date_bin(10000000000, time@0, 
0) as time], aggr=[selector_last(sum_idle,time), selector_last(sum_system,time)]
                 RepartitionExec: partitioning=RoundRobinBatch(16), 
input_partitions=1
                   SortExec: expr=[time@0 ASC NULLS LAST]
                     CoalescePartitionsExec
                       ProjectionExec: expr=[time@0 as time, 
SUM(cpu.usage_idle)@1 as sum_idle, SUM(cpu.usage_system)@2 as sum_system]
                         AggregateExec: mode=FinalPartitioned, gby=[time@0 as 
time], aggr=[SUM(cpu.usage_idle), SUM(cpu.usage_system)]
                           RepartitionExec: partitioning=Hash([time@0], 16), 
input_partitions=16
                             AggregateExec: mode=Partial, 
gby=[date_bin(10000000000, time@0, 0) as time], aggr=[SUM(cpu.usage_idle), 
SUM(cpu.usage_system)]
                               RepartitionExec: 
partitioning=RoundRobinBatch(16), input_partitions=1
                                 ProjectionExec: expr=[time@1 as time, 
usage_idle@2 as usage_idle, usage_system@3 as usage_system]
                                   FilterExec: date_bin(10000000000, time@1, 0) 
<= 1698940686290451000 AND time@1 <= 1698940686290451000 AND cpu@0 = cpu-total
                                     ParquetExec: file_groups={1 group: 
[[2/8/0649f0e8b1abed092a356ec6181369fcf585431d1cc0694a0cc4ab45cf78b49d/0c5ac9b2-f6d4-4004-9036-15412da47647.parquet]]},
 projection=[cpu, time, usage_idle, usage_system], 
predicate=date_bin(10000000000, time@2, 0) <= 1698940686290451000 AND time@2 <= 
1698940686290451000 AND cpu@0 = cpu-total, pruning_predicate=time_min@0 <= 
1698940686290451000 AND cpu_min@1 <= cpu-total AND cpu-total <= cpu_max@2
   ```
   
   But then after EnforceSorting the `SortPreservingMergeExec` seems to have to 
sort exprs anymore:
   ```
   2023-11-02T15:58:06.605925Z TRACE log: Optimized physical plan by 
EnforceSorting:
   OutputRequirementExec
     SortPreservingMergeExec: [time@1 ASC NULLS LAST] 
       SortExec: expr=[time@1 ASC NULLS LAST]
         ProjectionExec: expr=[cpu as iox::measurement, time@0 as time, 
(selector_last(sum_idle,time)@1).[value] as last, 
(selector_last(sum_system,time)@2).[value] as last_1]
           AggregateExec: mode=FinalPartitioned, gby=[time@0 as time], 
aggr=[selector_last(sum_idle,time), selector_last(sum_system,time)]
       ----> SortPreservingRepartitionExec: partitioning=Hash([time@0], 16), 
input_partitions=16 
               AggregateExec: mode=Partial, gby=[date_bin(10000000000, time@0, 
0) as time], aggr=[selector_last(sum_idle,time), selector_last(sum_system,time)]
                 RepartitionExec: partitioning=RoundRobinBatch(16), 
input_partitions=16
                   ProjectionExec: expr=[time@0 as time, SUM(cpu.usage_idle)@1 
as sum_idle, SUM(cpu.usage_system)@2 as sum_system]
                     AggregateExec: mode=FinalPartitioned, gby=[time@0 as 
time], aggr=[SUM(cpu.usage_idle), SUM(cpu.usage_system)]
                       RepartitionExec: partitioning=Hash([time@0], 16), 
input_partitions=16
                         AggregateExec: mode=Partial, 
gby=[date_bin(10000000000, time@0, 0) as time], aggr=[SUM(cpu.usage_idle), 
SUM(cpu.usage_system)]
                           RepartitionExec: partitioning=RoundRobinBatch(16), 
input_partitions=1
                             ProjectionExec: expr=[time@1 as time, usage_idle@2 
as usage_idle, usage_system@3 as usage_system]
                               FilterExec: date_bin(10000000000, time@1, 0) <= 
1698940686290451000 AND time@1 <= 1698940686290451000 AND cpu@0 = cpu-total
                                 ParquetExec: file_groups={1 group: 
[[2/8/0649f0e8b1abed092a356ec6181369fcf585431d1cc0694a0cc4ab45cf78b49d/0c5ac9b2-f6d4-4004-9036-15412da47647.parquet]]},
 projection=[cpu, time, usage_idle, usage_system], 
predicate=date_bin(10000000000, time@2, 0) <= 1698940686290451000 AND time@2 <= 
1698940686290451000 AND cpu@0 = cpu-total, pruning_predicate=time_min@0 <= 
1698940686290451000 AND cpu_min@1 <= cpu-total AND cpu-total <= cpu_max@2
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to