[
https://issues.apache.org/jira/browse/CALCITE-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903464#comment-15903464
]
Jesus Camacho Rodriguez commented on CALCITE-1682:
--------------------------------------------------
[~julianhyde], one doubt I have is concerning the unique identification of a
table (within a TableScan) in the plan for expression origin/lineage. If we
have a self-join on table A with condition a1.x = a2.y, I would like lineage to
be different than a filter on single table A with condition a.x = a.y. However,
_getQualifiedName_ for the concerning tables will return the same qualified
name (obviously)... Any thoughts on how we solve this?
> New metadata providers for expression column origin and all predicates in plan
> ------------------------------------------------------------------------------
>
> Key: CALCITE-1682
> URL: https://issues.apache.org/jira/browse/CALCITE-1682
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.12.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
>
> I am working on the integration of materialized view rewriting within Hive.
> Once a view matches an operator plan, rewriting is split vastly in two steps.
> The first step will verify that the input to the root operator of the matched
> plan is equivalent or contained within the input to the root operator of the
> query representing the view. The second step will trigger a _unify_ rule,
> which tries to rewrite the matched operator tree into a scan on the view and
> possibly some additional operators to compute the exact results needed by the
> query (think about Project that alters the column order, additional Filter on
> the view, additional Join operation, etc.)
> If we focus on step 1, checking equivalence/containment, I would like to
> extend the metadata providers in Calcite to give us more information about
> the matched (sub)plan. In particular, I am thinking on:
> - Expression column origin. Currently Calcite can provide the column origins
> for a certain column and whether it is derived or not. However, we would need
> to obtain the expression that generated a certain column. This expression
> should contain references to the input tables. For instance, given expression
> column _c_, the new md provider would return that it was generated by
> expression _A.a + B.b_.
> - All predicates. Currently Calcite can extract predicates that have been
> applied on an RelNode output (we can think on them as constraints on the
> output). However, I would like to extract all predicates that have been
> applied on a given RelNode (sub)plan. Since nodes might not be part of the
> output, expressions should contain references to the input tables. For
> instance, the new md provider might return the expressions _A.a + B.b > C.c
> AND D.d = 100_.
> - PK-FK relationship. I do not plan to implement this one immediately.
> However, exposing this information (given it is provided) can help us to
> trigger more rewriting containing join operators. Thus, I was wondering if it
> is worth adding it.
> Once this information is available, we can rely on it to implement logic
> similar to [1] to check whether a given (sub)plan is equivalent/contained
> within a given view.
> One question I have is about representing the table columns as a RexNode, as
> I think it is the easiest way to be returned by the new metadata providers. I
> checked _RexPatternFieldRef_ and I think it will meet our requirements: alpha
> would be the qualified table name, while the index is the column idx for the
> table. Thoughts?
> I have started working on this and will provide a patch shortly; feedback is
> greatly appreciated.
> [1]
> ftp://ftp10.us.freebsd.org/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)