[GitHub] [arrow-datafusion] alamb opened a new issue, #4967: Sometimes Filters are not repartitioned when they could be

GitBox Wed, 18 Jan 2023 04:18:28 -0800


alamb opened a new issue, #4967:
URL: https://github.com/apache/arrow-datafusion/issues/4967


   **Describe the bug**
   
   
   We previously had a plan like this (where the RepartitionExec was added 
prior to a filter in order to increase parallelism).
   
   However, after upgrading DataFusion, the  RepartitionExec is no longer 
there. I actually think this is a slightly worse plan as now the filter can not 
be done in parallel
   
   
   ```
   FilterExec: tag@2 = A
    RepartitionExec: partitioning=RoundRobinBatch(4)  <--- This RepartitionExec 
has been removed
      DeduplicateExec: [tag@2 ASC,time@3 ASC]
       SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
         UnionExec
          ParquetExec: limit=None, partitions={1 group: 
[[1/1/1/1/00000000-0000-0000-0000-000000000000.parquet]]}, predicate=tag = 
Dictionary(Int32, Utf8("A")), pruning_predicate=tag_min@0 <= A AND A <= 
tag_max@1, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, 
time] |
          SortExec: [tag@2 ASC,time@3 ASC].
            RecordBatchesExec: batches_groups=1 batches=1
   ```
   
   
   **To Reproduce**
   I am working on a reproducer
   
   **Expected behavior**
   A `RepartitionExec` should be added if it will increase parallelism for 
filtering
   
   **Additional context**
   We found this while upgrading IOx:
   
   https://github.com/influxdata/influxdb_iox/pull/6603 -- see 
https://github.com/influxdata/influxdb_iox/pull/6603/files#r1072606494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new issue, #4967: Sometimes Filters are not repartitioned when they could be

Reply via email to