gene-bordegaray commented on code in PR #19304:
URL: https://github.com/apache/datafusion/pull/19304#discussion_r2632618894


##########
datafusion/common/src/config.rs:
##########
@@ -1000,6 +1000,34 @@ config_namespace! {
         /// ```
         pub repartition_sorts: bool, default = true
 
+        /// Partition count threshold for subset satisfaction optimization.

Review Comment:
   I chose this because this seemed like a good number to use in tpch bench 
since it allow for repartitioning to increase parallelism but the porperty 
still kicks in for test coverage. Let me know if you think there is a better 
way I should approach this.
   
   I do agree that it might become a problem if there is extreme skew but I 
believe that users will more often than not repartition to maximixe parallelism 
early on in the plan, say the first repartition thus I thought this would be a 
rare enough case for it to be more beneficial to have users take advantage of 
this property more often.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to