Re: [PR] feat: physical execution for range partitioning [datafusion]

via GitHub Tue, 30 Jun 2026 09:46:38 -0700


stuhood commented on PR #23231:
URL: https://github.com/apache/datafusion/pull/23231#issuecomment-4845877514


   > It is unclear to me exactly the best way to make this decision and if / 
how we can recognize to use one or the other. Should this be something that 
users specify as a config? Is there some way to dtect this? Should we only 
repartition to range if it is to a superset of the current range partitioning 
(example: data partitioned on `day` -> repartition to `hour`)?
   
   From briefly looking around, I only see a few cases where a logical 
optimizer might want to request Range rather than Hash (things like non-equi 
joins, re-organizing data for _output_ that preserves partitioning, global 
window functions.)
   
   > Before really diving into this we shoudl step back and plan how 
repartitioning will work from a high level first before diving into the nitty 
gritty.
   
   But the choice to introduce Range partitioning would be a _logical_ 
decision, right? So, while I agree that changing logical optimizers to request 
Range would take a lot of thought and design, implementing the physical side 
(this PR) doesn't seem to be blocked on that? Or are you concerned that the API 
might still shift, or that it won't have enough test-coverage?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: physical execution for range partitioning [datafusion]

Reply via email to