alamb commented on PR #23231:
URL: https://github.com/apache/datafusion/pull/23231#issuecomment-4858610653

   > @stuhood I am most concerned with implementing physical layer behavior 
before having a real use for it that we can represent. What would the use case 
of being able to repartiution on range right now be? Do you have a use case 
where you would like to phsyically insert a repartition on range? maybe this is 
a good place to start the conversation on where and how this should be decided 🤔
   
   The main usecase we have at the moment for range partitioning is when the 
input source data is already range partitioned and the point of the work in 
this epic is for DataFusion to know about that (pre-existing) partitioning and 
take advantage of it
   
   I think you guys are talking about having hte optimizer decide to 
repartition data into ranges (e.g. when it wants to add more parallelism to the 
plan). That would probably need to be a cost based decision based on statistics 
(like value distributions) that we don't yet have in DataFusion (and maybe 
never will have). 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to