[
https://issues.apache.org/jira/browse/CALCITE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145678#comment-16145678
]
Jesus Camacho Rodriguez commented on CALCITE-1965:
--------------------------------------------------
[~christian.beikov], this is all very exciting, thanks for putting together the
examples.
One aspect that needs more thought is the structure of the OEQ, since just
storing the predicates doing a DFS traversal of the join tree will not work for
more complex cases. In particular, I think it would be necessary to add
structural identifiers to OEQ, e.g. pre-post-depth identifiers are an example,
to actually know the exact position of the predicate when we have multi-level
join trees.
For instance, consider the following examples, which have the same OEQ but are
not equivalent:
{noformat}
EQ1={c.b, b.b}
OEQ1={e.d, d.d}{c.c, d.c}{a.a, b.a}
ljoin(e.d = d.d)
/ \
table(e) ljoin(c.c = d.c)
/ \
ijoin(c.b = b.b) table(d)
/ \
table(c) ljoin(a.a = b.a)
/ \
table(a) table(b)
EQ2={c.b, b.b}
OEQ2={e.d, d.d}{c.c, d.c}{a.a, b.a}
ljoin(e.d = d.d)
/ \
table(e) ljoin(c.c = d.c)
/ \
ijoin(a.a = b.b) table(d)
/ \
table(a) ljoin(c.b = b.a)
/ \
table(c) table(b)
{noformat}
However, adding the identifiers:
{noformat}
EQ'1= 4-6-3{c.b, b.b}
OEQ'1= 1-9-1{e.d, d.d} 3-8-2{c.c, d.c} 6-5-4{a.a, b.a}
EQ'2= 6-5-4{c.b, b.b}
OEQ'2= 1-9-1{e.d, d.d} 3-8-2{c.c, d.c} 4-6-3{a.a, b.a}
{noformat}
Other schemes for the identifiers could be considered too. I have not thought
how this would play with your idea for the matching / rewriting as it will add
certain complexity, what do you think?
Another question I have is how to deal with multiple repetitions of same
table... With the identifiers this seems easier, but without them, I am not
sure whether we would be able to rewrite those cases.
Wrt the normalization step (rewriting to left outer joins), it can be done via
transformation rule {{JoinCommuteRule.java}}, which could be triggered before
attempting the rewriting.
Another aspect to control would be where to trigger the rule, trying to
minimize the number of executions (but this can be done later on).
About subqueries, for your particular example above, when the subqueries are
parsed they will be rewritten into a plan with a left outer join over a left
outer join, thus it would not be a problem. I was talking about these kind of
plans that can result from the subquery or the ON clause having other Filter
expressions:
*Query1 With Filter*
{noformat}
select *
from a
left join (
select b.a as x
from b
left join c on b.b = c.b
where b.s = 'foo'
) on a.a = x
ijoin(a.a = b.a)
/ \
/ filter(b.s = 'foo)
/ \
table(a) ljoin(b.b = c.b)
/ \
table(b) table(c) :
{noformat}
*Query2 With Filter*
{noformat}
select *
from a
left join (
select b.a as x
from (select b.a, b.b from b where b.s = 'foo') b
left join c on b.b = c.b
) on a.a = x
ijoin(a.a = b.a)
/ \
table(a) ljoin(b.b = c.b)
/ \
filter(b.s = 'foo') \
/ \
table(b) table(c) :
{noformat}
*Query3 With Filter*
{noformat}
select *
from a
left join (
select b.a as x
from b
left join (select c.b from c where c.s = 'foo') c on b.b = c.b
) on a.a = x
ijoin(a.a = b.a)
/ \
table(a) ljoin(b.b = c.b)
/ \
/ filter(c.s = 'foo')
/ \
table(b) table(c) :
{noformat}
In those cases, we could bail out from the rewriting right now, but I just
wanted to know whether you had considered how to tackle them, as having some
filtering in the query is a pattern that will be quite common, at least on top
of the table scan operators as in the last two examples.
Another more complex example also for a follow-up:
*Query4 With Filter*
{noformat}
select *
from a
left join (
select b.a as x
from b
left join c on b.b = c.b and b.s = 'foo'
) on a.a = x
ijoin(a.a = b.a)
/ \
table(a) ljoin(b.b = c.b and b.s = 'foo')
/ \
table(b) table(c) :
{noformat}
I was not aware that the same authors had tackled the rewriting problem for
outer joins too, my bad, thanks!
The approach that they discuss in that article seems to cover the case when you
have a Filter operator on top of the TS too. Also it seems to play well with
the approach in the previous paper / what we already implemented in Calcite. I
will check it further too and let you know.
> Support outer joins for materialized views
> ------------------------------------------
>
> Key: CALCITE-1965
> URL: https://issues.apache.org/jira/browse/CALCITE-1965
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Christian Beikov
> Assignee: Julian Hyde
>
> Currently, only inner joins are supported for materialized view
> substitutions. The support for outer joins involves creating new pulled up
> predicates in case of outer joins that represent semantics of the join. For a
> join predicate like "a.id = b.id" the inner join just pulls up that
> predicate. When having a left join like e.g. {{select * from a left join b on
> a.id = b.id}}, the actual pulled up predicate would be {{OR(=(a.id,
> b.id),ISNULL(b.id))}}. For a right join it would be {{OR(=(a.id,
> b.id),ISNULL(a.id))}} and for a full outer join it would be {{OR(=(a.id,
> b.id),ISNULL(a.id),ISNULL(b.id))}}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)