[ 
https://issues.apache.org/jira/browse/TAJO-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihoon Son updated TAJO-680:
----------------------------
    Attachment: Distributed plan.png
                Logical plan.png

I attached a logical plan and a distributed execution plan for the above 
example query. As noted above, the results of the first phase are written into 
disks, and then shuffled according to the key's value. In the second phase, the 
results of the first phase are joined with the outer table. 

This execution returns valid results even when the inner query table does not 
have any relations with the inner relation. For another example, please 
consider the following query.
{noformat}
default> select n_name from nation where n_nationkey in (select count(*) from 
region group by substr(r_name, 1, 1));
{noformat}
This query first retrieves the appearance number of the first character of the 
region name, and then finds the names of the nations whose keys are in the 
result set of the first phase.
In this case, the result set of the first phase is handled as a kind of a table 
that is joined with the nation table.

> Improve the IN operator to support sub queries
> ----------------------------------------------
>
>                 Key: TAJO-680
>                 URL: https://issues.apache.org/jira/browse/TAJO-680
>             Project: Tajo
>          Issue Type: Improvement
>          Components: distributed query plan, parser
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.11
>
>         Attachments: Distributed plan.png, Logical plan.png
>
>
> Currently, the IN operator can be used with only sets of values.
> We need to improve it to support sub queries as the following example query.
> {noformat}
> tajo> select * from nation where n_regionkey in (select r_regionkey from 
> region); 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to