[GitHub] [arrow-datafusion] metesynnada opened a new issue, #7113: HashJoinExec output order is not correct

via GitHub Thu, 27 Jul 2023 07:46:04 -0700


metesynnada opened a new issue, #7113:
URL: https://github.com/apache/arrow-datafusion/issues/7113


   ### Describe the bug
   
   In the current implementation of `calculate_hash_join_output_order`, we 
operate under the assumption that the order of the build side is also preserved 
lexicographically.
   
   For example, if the left side order is `a ASC` and the right side order is 
`b ASC`, the outcome was `b ASC, a ASC`.
   
   However, changes to the join hash table implementation now mean that the 
order on the left side is maintained, but in reverse, as the chain pointer 
begins at the end and concludes at the start of the row indexes, i.e., 
`[5,4,3,2,1]`.
   
   @Dandandan, I'm interested in your perspective on this. My inclination is 
that maintaining the order lexicographically could significantly benefit 
aggregations. Yet, I'm struggling to identify an obvious solution.
   
   ### To Reproduce
   
   NA
   
   ### Expected behavior
   
   NA
   
   ### Additional context
   
   NA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] metesynnada opened a new issue, #7113: HashJoinExec output order is not correct

Reply via email to