alamb commented on PR #23231: URL: https://github.com/apache/datafusion/pull/23231#issuecomment-4858610653
> @stuhood I am most concerned with implementing physical layer behavior before having a real use for it that we can represent. What would the use case of being able to repartiution on range right now be? Do you have a use case where you would like to phsyically insert a repartition on range? maybe this is a good place to start the conversation on where and how this should be decided 🤔 The main usecase we have at the moment for range partitioning is when the input source data is already range partitioned and the point of the work in this epic is for DataFusion to know about that (pre-existing) partitioning and take advantage of it I think you guys are talking about having hte optimizer decide to repartition data into ranges (e.g. when it wants to add more parallelism to the plan). That would probably need to be a cost based decision based on statistics (like value distributions) that we don't yet have in DataFusion (and maybe never will have). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
