korowa commented on PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#issuecomment-1806297904
@metesynnada thank you for you explanation! I've finally got it now and indeed, it has nothing in common with join operation. > Why don't we construct the join result in this way? The main reason -- is that I've underestimated the importance of preserving build-side order and reused current implementation of `update_hash` / `build_equal_condition_join_indices` -- now I see that it's incorrect. Regarding how to fix it prior to merging this PR -- I'm going to file separate issue regarding FIFO HashMap and allowing `HashJoinExec` to produce output in correct order (as in your example) while iterating in natural order ([reverse](https://github.com/apache/arrow-datafusion/blob/91c9d6f847eda0b5b1d01257b5c24459651d3926/datafusion/physical-plan/src/joins/hash_join.rs#L906) iteration that we have now helps to maintain the order, but doesn't allow partial output without processing the whole batch). I'll also mark this PR as WIP again, cause current behaviour is incorrect. Thanks again for you patient explanations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
