Re: [PR] Enforce sorting handle fetchable operators, add option to repartition based on row count estimates [datafusion]

via GitHub Fri, 09 Aug 2024 08:03:09 -0700


mustafasrepo commented on code in PR #11875:
URL: https://github.com/apache/datafusion/pull/11875#discussion_r1711636263



##########
datafusion/sqllogictest/test_files/count_star_rule.slt:
##########
@@ -86,10 +86,8 @@ logical_plan
 physical_plan
 01)ProjectionExec: expr=[a@0 as a, count() PARTITION BY [t1.a] ROWS BETWEEN 
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@1 as count_a]
 02)--WindowAggExec: wdw=[count() PARTITION BY [t1.a] ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING: Ok(Field { name: "count() PARTITION BY 
[t1.a] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING", data_type: 
Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), 
frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), 
end_bound: Following(UInt64(NULL)), is_causal: false }]
-03)----SortExec: expr=[a@0 ASC NULLS LAST], preserve_partitioning=[true]
-04)------CoalesceBatchesExec: target_batch_size=8192
-05)--------RepartitionExec: partitioning=Hash([a@0], 4), input_partitions=1
-06)----------MemoryExec: partitions=1, partition_sizes=[1]
+03)----SortExec: expr=[a@0 ASC NULLS LAST], preserve_partitioning=[false]

Review Comment:
   Indeed, at single partition hash requirement is trivially satisfied. When 
row number is too small, increasing partitioning with hash is not beneficial 
also (As output batches will have small number of rows.).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Enforce sorting handle fetchable operators, add option to repartition based on row count estimates [datafusion]

Reply via email to