2010YOUY01 commented on PR #16660:
URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3219835880

   I tend to think it's better to include the planner part into the initial PR, 
the reason is if we do it in two steps, the executor can be incompatible with 
other operators, so the follow-on PR would also have a large diff.
   e.g. I think projections are required (like left input has 2 columns `a`, 
`b`, right input has 2 columns `c`, `d`, the output might only contain `a`, 
`c`, since `b` and `d` are only used to evaluate join condition but not 
required in the output), but it's not implemented now.
   Also, if we have SQL interface to run, there are many existing test cases to 
cover it, which make it easier to get merged.
   
   To make this task easier we want to shrink this PR, here are some ideas
   - Some preparations to setup the planner can be split to individual PRs?
   - I think the execution logic for existence joins (semi/anti/mark) is 
fundamentally different from traditional joins. It might be cleaner to split 
them into a separate stream implementation -- using a unified execution path 
can make the state management complex. For the initial PR, we could focus on 
including only one.
   ```
   PiecewiseMergeJoinExec
   --existence-join?--> ExistencePWMJStream
   --not-existence-join?-->TraditionalPWMJStream
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to