korowa commented on PR #8020:
URL: 
https://github.com/apache/arrow-datafusion/pull/8020#issuecomment-1806297904

   @metesynnada thank you for you explanation! I've finally got it now and 
indeed, it has nothing in common with join operation.
   
   > Why don't we construct the join result in this way?
   The main reason -- is that I've underestimated the importance of preserving 
build-side order and reused current implementation of `update_hash` / 
`build_equal_condition_join_indices` -- now I see that it's incorrect.
   
   Regarding how to fix it prior to merging this PR -- I'm going to file 
separate issue regarding FIFO HashMap and allowing `HashJoinExec` to produce 
output in correct order (as in your example) while iterating in natural order 
([reverse](https://github.com/apache/arrow-datafusion/blob/91c9d6f847eda0b5b1d01257b5c24459651d3926/datafusion/physical-plan/src/joins/hash_join.rs#L906)
 iteration that we have now helps to maintain the order, but doesn't allow 
partial output without processing the whole batch).
   
   I'll also mark this PR as WIP again, cause current behaviour is incorrect.
   
   Thanks again for you patient explanations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to