alamb opened a new issue, #4967:
URL: https://github.com/apache/arrow-datafusion/issues/4967
**Describe the bug**
We previously had a plan like this (where the RepartitionExec was added
prior to a filter in order to increase parallelism).
However, after upgrading DataFusion, the RepartitionExec is no longer
there. I actually think this is a slightly worse plan as now the filter can not
be done in parallel
```
FilterExec: tag@2 = A
RepartitionExec: partitioning=RoundRobinBatch(4) <--- This RepartitionExec
has been removed
DeduplicateExec: [tag@2 ASC,time@3 ASC]
SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
UnionExec
ParquetExec: limit=None, partitions={1 group:
[[1/1/1/1/00000000-0000-0000-0000-000000000000.parquet]]}, predicate=tag =
Dictionary(Int32, Utf8("A")), pruning_predicate=tag_min@0 <= A AND A <=
tag_max@1, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag,
time] |
SortExec: [tag@2 ASC,time@3 ASC].
RecordBatchesExec: batches_groups=1 batches=1
```
**To Reproduce**
I am working on a reproducer
**Expected behavior**
A `RepartitionExec` should be added if it will increase parallelism for
filtering
**Additional context**
We found this while upgrading IOx:
https://github.com/influxdata/influxdb_iox/pull/6603 -- see
https://github.com/influxdata/influxdb_iox/pull/6603/files#r1072606494
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]