haohuaijin commented on issue #8374: URL: https://github.com/apache/arrow-datafusion/issues/8374#issuecomment-1833941569
After do some research, I find this error cause by `ProjectionPushdown` rule
in physical optimizer
```
| physical_plan after OutputRequirements | ProjectionExec: expr=[column1@0
as column1, column2@1 as column2]
|
| | SortPreservingMergeExec:
[column1@0 ASC NULLS LAST,column2@2 ASC NULLS LAST]
|
| | SortExec: expr=[column1@0
ASC NULLS LAST,column2@2 ASC NULLS LAST]
|
| | ProjectionExec:
expr=[column1@0 as column1, column2@1 as column2, column2@3 as column2] <--
before we have column2@3(e.column2)
|
| | CoalesceBatchesExec:
target_batch_size=8192
|
| | HashJoinExec:
mode=Partitioned, join_type=Inner, on=[(column1@0, column1@0)]
|
| | CoalesceBatchesExec:
target_batch_size=8192
|
| | RepartitionExec:
partitioning=Hash([column1@0], 24), input_partitions=1
|
| | MemoryExec:
partitions=1, partition_sizes=[1]
|
| | CoalesceBatchesExec:
target_batch_size=8192
|
| | RepartitionExec:
partitioning=Hash([column1@0], 24), input_partitions=1
|
| | MemoryExec:
partitions=1, partition_sizes=[1]
|
| |
|
| physical_plan after PipelineChecker | SAME TEXT AS ABOVE
|
| physical_plan after LimitAggregation | SAME TEXT AS ABOVE
|
| physical_plan after ProjectionPushdown | SortPreservingMergeExec:
[column1@0 ASC NULLS LAST,column2@1 ASC NULLS LAST]
|
| | SortExec: expr=[column1@0 ASC
NULLS LAST,column2@1 ASC NULLS LAST]
|
| | ProjectionExec:
expr=[column1@0 as column1, column2@1 as column2] <-- after we elimiate
column2@3(e.column2)
|
| | CoalesceBatchesExec:
target_batch_size=8192
|
| | HashJoinExec:
mode=Partitioned, join_type=Inner, on=[(column1@0, column1@0)]
|
| | CoalesceBatchesExec:
target_batch_size=8192
|
| | RepartitionExec:
partitioning=Hash([column1@0], 24), input_partitions=1
|
| | MemoryExec:
partitions=1, partition_sizes=[1]
|
| | CoalesceBatchesExec:
target_batch_size=8192
|
| | RepartitionExec:
partitioning=Hash([column1@0], 24), input_partitions=1
|
| | MemoryExec:
partitions=1, partition_sizes=[1]
```
the reason for this rewrite, may be because we only use column name for
identify a column in below codeļ¼
https://github.com/apache/arrow-datafusion/blob/06bbe1298fa8aa042b6a6462e55b2890969d884a/datafusion/core/src/physical_optimizer/projection_pushdown.rs#L866-L872
When the column names are identical, the error will arise
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
