NGA-TRAN commented on code in PR #18673:
URL: https://github.com/apache/datafusion/pull/18673#discussion_r2528044884
##########
datafusion/core/tests/physical_optimizer/enforce_distribution.rs:
##########
@@ -1424,19 +1424,19 @@ fn multi_smj_joins() -> Result<()> {
// Should include 6 RepartitionExecs (3 hash, 3
round-robin), 3 SortExecs
JoinType::Inner | JoinType::Left | JoinType::LeftSemi |
JoinType::LeftAnti => {
assert_plan!(plan_distrib, @r"
-SortMergeJoin: join_type=..., on=[(a@0, c@2)]
- SortMergeJoin: join_type=..., on=[(a@0, b1@1)]
- SortExec: expr=[a@0 ASC], preserve_partitioning=[true]
- RepartitionExec: partitioning=Hash([a@0], 10), input_partitions=1
- DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d,
e], file_type=parquet
- SortExec: expr=[b1@1 ASC], preserve_partitioning=[true]
- RepartitionExec: partitioning=Hash([b1@1], 10), input_partitions=1
- ProjectionExec: expr=[a@0 as a1, b@1 as b1, c@2 as c1, d@3 as d1, e@4
as e1]
- DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c,
d, e], file_type=parquet
- SortExec: expr=[c@2 ASC], preserve_partitioning=[true]
- RepartitionExec: partitioning=Hash([c@2], 10), input_partitions=1
- DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d,
e], file_type=parquet
-");
+ SortMergeJoin: join_type=..., on=[(a@0, c@2)]
+ SortMergeJoin: join_type=..., on=[(a@0, b1@1)]
+ SortExec: expr=[a@0 ASC],
preserve_partitioning=[true]
+ RepartitionExec: partitioning=Hash([a@0], 10),
input_partitions=1, maintains_sort_order=true
+ DataSourceExec: file_groups={1 group: [[x]]},
projection=[a, b, c, d, e], file_type=parquet
+ SortExec: expr=[b1@1 ASC],
preserve_partitioning=[true]
+ RepartitionExec: partitioning=Hash([b1@1], 10),
input_partitions=1, maintains_sort_order=true
+ ProjectionExec: expr=[a@0 as a1, b@1 as b1,
c@2 as c1, d@3 as d1, e@4 as e1]
+ DataSourceExec: file_groups={1 group:
[[x]]}, projection=[a, b, c, d, e], file_type=parquet
Review Comment:
I do not see the data is sorted here. So the SortExec above is correct and
the `maintains_sort_order=true.` is misleading.
@ruchirK : You may need to look into the fix again. It not does exactly what
we want. What we want is to display something like `is sorted` if the data is
sorted and 1 partition. It seems you show everything is sorted if it is one
partition which is not correct
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]