[
https://issues.apache.org/jira/browse/TAJO-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362875#comment-14362875
]
Keuntae Park commented on TAJO-680:
-----------------------------------
Hi, [~jihoonson].
I'm also very interested in In-Subquery and have a suggestion about handling it
(actually only uncorrelated subquery cases because handling correlated subquery
needs more careful consideration).
What I think is that transforming In-Subquery to Semi-(or Anti-) Join in
LogicalPlanPreprocessor.
(At first, I though the way of doing at query parsing phase but it may not
resolve columns in the left side of In-Subquery.)
1) During visitFilter(), check if qual of Selection contains in-subquery Expr.
2) If so, find the relation related with column in the left side of In-subquery
from relationList which is child of Selection Expr.
3) Make (semi or anti) joinNode with the found relation as left relation,
subquery as right relation, and the join condition of 'left side column of
In-subquery equals projection column of subquery'
4) It needs to update Expr tree also to reflect changed relationList that
contains new join not previous relation, because it will be used again in
LogicalPlanner
After above, remaining are already implemented because Tajo already has hash
shuffle based semi and anti join physical operators.
It may be also possible to implement new EvalNode that represents in-subquery
and convert it to JoinNode at LogicalOptimizer.
But, in this case, code to handle in-subquery is spread between LogicalPlanner
(In-subquery recognition) and LogicalOptimizer (plan transformation).
And I feel that doing something in LogicalPlanPreprocessor is easier than work
at LogicalOptimizer :)
Please, leave me a comment on above suggestion.
By the way, does this issue replace TAJO-596?
> Improve the IN operator to support sub queries
> ----------------------------------------------
>
> Key: TAJO-680
> URL: https://issues.apache.org/jira/browse/TAJO-680
> Project: Tajo
> Issue Type: Improvement
> Components: distributed query plan, parser
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Fix For: 0.11.0
>
> Attachments: Distributed plan.png, Logical plan.png
>
>
> Currently, the IN operator can be used with only sets of values.
> We need to improve it to support sub queries as the following example query.
> {noformat}
> tajo> select * from nation where n_regionkey in (select r_regionkey from
> region);
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)