alamb commented on code in PR #7247:
URL: https://github.com/apache/arrow-datafusion/pull/7247#discussion_r1290235023
##########
datafusion/core/tests/sqllogictests/test_files/join_disable_repartition_joins.slt:
##########
@@ -56,11 +56,13 @@ Limit: skip=0, fetch=5
----------TableScan: annotated_data projection=[a, c]
physical_plan
GlobalLimitExec: skip=0, fetch=5
---ProjectionExec: expr=[a@1 as a]
-----CoalesceBatchesExec: target_batch_size=8192
-------HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(c@0, c@1)]
---------CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[c],
has_header=true
---------CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a, c],
output_ordering=[a@0 ASC NULLS LAST], has_header=true
+--SortPreservingMergeExec: [a@0 ASC NULLS LAST], fetch=5
Review Comment:
I agree this plan looks better, but on the other hand it is part of a test
that is supposed to disable repartitioning join inputs 🤔
```
set datafusion.optimizer.repartition_joins = false;
```
But I think it is ok, because `datafusion.optimizer.repartition_joins`
controls repartitioning the left (build) side which is not repartitioned
##########
datafusion/core/src/physical_plan/mod.rs:
##########
@@ -135,13 +135,13 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
///
/// The default implementation returns `true` unless this operator
/// has signalled it requires a single child input partition.
- fn benefits_from_input_partitioning(&self) -> bool {
Review Comment:
Perhaps we can also update the documentation comment here to explain that
the returned vector has an entry for each input
##########
datafusion/core/tests/sqllogictests/test_files/join_disable_repartition_joins.slt:
##########
@@ -56,11 +56,13 @@ Limit: skip=0, fetch=5
----------TableScan: annotated_data projection=[a, c]
physical_plan
GlobalLimitExec: skip=0, fetch=5
---ProjectionExec: expr=[a@1 as a]
-----CoalesceBatchesExec: target_batch_size=8192
-------HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(c@0, c@1)]
---------CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[c],
has_header=true
---------CsvExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a, c],
output_ordering=[a@0 ASC NULLS LAST], has_header=true
+--SortPreservingMergeExec: [a@0 ASC NULLS LAST], fetch=5
Review Comment:
I agree this plan looks better, but on the other hand it is part of a test
that is supposed to disable repartitioning join inputs 🤔
```
set datafusion.optimizer.repartition_joins = false;
```
But I think it is ok, because `datafusion.optimizer.repartition_joins`
controls repartitioning the left (build) side which is not repartitioned
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]