korowa commented on PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#issuecomment-1801202574
@alamb I guess so -- seems like it's going to grow large enough, to make the review inconvenient and reach critical mass of quite an important changes 😞 One possible way is to split it into 1) FIFO `JoinHashMap` -- to preserve natural input order for build side 2) State management for `HashJoinStream` -- looks like it doesn't make much sense to build state machine only for matched/joined indices tracking, and instead it should be stream related 3) Applying current PR on top of the previous two (all 3 assumes retaining `SymmetricHashJoin` logic as-is with code duplication if required). First two parts could potentially be separate issues/PRs, and third is all about merging (or, more likely, rewriting from the scratch) so, I'd highly appreciate a help with any of the first two. Any thoughts on this plan? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
