stuhood commented on PR #23231: URL: https://github.com/apache/datafusion/pull/23231#issuecomment-4845877514
> It is unclear to me exactly the best way to make this decision and if / how we can recognize to use one or the other. Should this be something that users specify as a config? Is there some way to dtect this? Should we only repartition to range if it is to a superset of the current range partitioning (example: data partitioned on `day` -> repartition to `hour`)? From briefly looking around, I only see a few cases where a logical optimizer might want to request Range rather than Hash (things like non-equi joins, re-organizing data for _output_ that preserves partitioning, global window functions.) > Before really diving into this we shoudl step back and plan how repartitioning will work from a high level first before diving into the nitty gritty. But the choice to introduce Range partitioning would be a _logical_ decision, right? So, while I agree that changing logical optimizers to request Range would take a lot of thought and design, implementing the physical side (this PR) doesn't seem to be blocked on that? Or are you concerned that the API might still shift, or that it won't have enough test-coverage? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
