korowa commented on PR #8020:
URL: 
https://github.com/apache/arrow-datafusion/pull/8020#issuecomment-1801202574

   @alamb I guess so -- seems like it's going to grow large enough, to make the 
review inconvenient and reach critical mass of quite an important changes 😞 
   
   One possible way is to split it into
   1) FIFO `JoinHashMap` -- to preserve natural input order for build side
   2) State management for `HashJoinStream` -- looks like it doesn't make much 
sense to build state machine only for matched/joined indices tracking, and 
instead it should be stream related
   3) Applying current PR on top of the previous two
   
   (all 3 assumes retaining `SymmetricHashJoin` logic as-is with code 
duplication if required).
   
   First two parts could potentially be separate issues/PRs, and third is all 
about merging (or, more likely, rewriting from the scratch) so, I'd highly 
appreciate a help with any of the first two.
   
   Any thoughts on this plan?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to