[
https://issues.apache.org/jira/browse/TAJO-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jihoon Son updated TAJO-1632:
-----------------------------
Fix Version/s: (was: 0.11.0)
0.12.0
> Enable broadcast join planning for outer joins
> ----------------------------------------------
>
> Key: TAJO-1632
> URL: https://issues.apache.org/jira/browse/TAJO-1632
> Project: Tajo
> Issue Type: Improvement
> Components: distributed query plan
> Reporter: Jihoon Son
> Fix For: 0.12.0
>
>
> TAJO-1553 is recently resolved to improve broadcast join planning, but it has
> a limitation for outer joins. That is, _for outer joins, preserved-row
> relations are not broadcastable to avoid input data duplication._ This rule
> might limit broadcast join opportunity. Let me consider the following query
> as an example.
> {noformat}
> select * from a left outer join b left outer join c
> (a, b, and c are sufficiently small to be broadcasted.)
> {noformat}
> Please note that two consecutive left outer joins are associative. That is,
> their execution order can be changed without making result invalid. Thus,
> candidate query plans are as follows. (LOJ is short for left outer join)
> 1)
> {noformat}
> LOJ
> / \
> LOJ c
> / \
> a b
> {noformat}
> 2)
> {noformat}
> LOJ
> / \
> a LOJ
> / \
> b c
> {noformat}
> In the query plan 1), only *a* is preserved-row. Thus, if the query plan 1)
> is selected, our current broadcast join planner makes the entire query plan
> as a single execution block with broadcast relations of *b* and *c*.
> In contrast, if the query plan 2) is selected, it is executed with two
> execution blocks each of which performs a left outer join because only *c* is
> not preserved-row and thus broadcastable.
> This limitation according to the forms of selected query plan will degrade
> performance of outer join processing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)