gene-bordegaray opened a new issue, #23236:
URL: https://github.com/apache/datafusion/issues/23236

   ### Is your feature request related to a problem or challenge?
   
   `Distribution::HashPartitioned` is currently documented as requiring rows 
with equal key values to land in the same partition. That is a key-colocation 
contract, not necessarily a requirement that the existing input is physically 
hash partitioned.
   
   As range partitioning support expands, this name is increasingly confusing: 
range partitioning can satisfy some single-input key-colocation requirements, 
while multi-input operators such as joins still need additional co-partitioning 
checks.
   
   ### Describe the solution you'd like
   
   Clarify the long-term API direction for this distribution requirement. 
Options include:
   
   - keep `HashPartitioned` but document it as historical naming for key 
colocation
   - deprecate / migrate to a `KeyPartitioned` name
   - separately model cross-input co-partitioning requirements for joins 
instead of encoding them as independent per-child distributions
   
   This should be handled independently from operator-specific range 
partitioning support so each operator can opt in deliberately.
   
   ### Additional context
   
   Range partitioning epic: #22395
   
   Related work:
   
   - #23191
   - #23184
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to