Hello everyone,

DataFusion currently has three partitioning variants: Hash, RoundRobinBatch,
and UnknownPartitioning. It cannot accurately represent some partitioning
schemes, which makes optimizer and planning behavior brittle (relevant
discussion thread <https://github.com/apache/datafusion/issues/21207>).
A few community members have designed a Range partitioning variant
accurately describe range-partitioning schemes users have today. The first
PR <https://github.com/apache/datafusion/pull/22207> is purely mechanical:
adds the model and contract, marking unsupported call sites without
changing planning behavior. Follow-up work will add planning and execution
support incrementally:

   - Implement Range Partitioning Planning
   <https://github.com/apache/datafusion/issues/22397>
   - Implement Range Repartitioning
   <https://github.com/apache/datafusion/issues/22395>
   - Expose Range Partitioning Across FFI Boundaries
   <https://github.com/apache/datafusion/issues/22394>


Would appreciate any input, feel free to join the conversation here
<https://github.com/apache/datafusion/issues/21992>.

Thanks,
Gene

Reply via email to