Jesus Camacho Rodriguez created CALCITE-1682:
------------------------------------------------
Summary: New metadata providers for expression column origin and
all predicates in plan
Key: CALCITE-1682
URL: https://issues.apache.org/jira/browse/CALCITE-1682
Project: Calcite
Issue Type: New Feature
Components: core
Affects Versions: 1.12.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
I am working on the integration of materialized view rewriting within Hive.
Once a view matches an operator plan, rewriting is split vastly in two steps.
The first step will verify that the input to the root operator of the matched
plan is equivalent or contained within the input to the root operator of the
query representing the view. The second step will trigger a _unify_ rule, which
tries to rewrite the matched operator tree into a scan on the view and possibly
some additional operators to compute the exact results needed by the query
(think about Project that alters the column order, additional Filter on the
view, additional Join operation, etc.)
If we focus on step 1, checking equivalence/containment, I would like to extend
the metadata providers in Calcite to give us more information about the matched
(sub)plan. In particular, I am thinking on:
- Expression column origin. Currently Calcite can provide the column origins
for a certain column and whether it is derived or not. However, we would need
to obtain the expression that generated a certain column. This expression
should contain references to the input tables. For instance, given expression
column _c_, the new md provider would return that it was generated by
expression _A.a + B.b_.
- All predicates. Currently Calcite can extract predicates that have been
applied on an RelNode output (we can think on them as constraints on the
output). However, I would like to extract all predicates that have been applied
on a given RelNode (sub)plan. Since nodes might not be part of the output,
expressions should contain references to the input tables. For instance, the
new md provider might return the expressions _A.a + B.b > C.c AND D.d = 100_.
- PK-FK relationship. I do not plan to implement this one immediately. However,
exposing this information (given it is provided) can help us to trigger more
rewriting containing join operators. Thus, I was wondering if it is worth
adding it.
Once this information is available, we can rely on it to implement logic
similar to [1] to check whether a given (sub)plan is equivalent/contained
within a given view.
One question I have is about representing the table columns as a RexNode, as I
think it is the easiest way to be returned by the new metadata providers. I
checked _RexPatternFieldRef_ and I think it will meet our requirements: alpha
would be the qualified table name, while the index is the column idx for the
table. Thoughts?
I have started working on this and will provide a patch shortly; feedback is
greatly appreciated.
[1]
ftp://ftp10.us.freebsd.org/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)