[ 
https://issues.apache.org/jira/browse/CALCITE-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904767#comment-15904767
 ] 

Jesus Camacho Rodriguez commented on CALCITE-1682:
--------------------------------------------------

Using constraints could be an elegant way of defining it. Since this would be 
useful to check whether rows are in the view with the right duplication factor, 
will _IN_ imply that there is a single match? Or will we need a stronger 
constraint?

> New metadata providers for expression column origin and all predicates in plan
> ------------------------------------------------------------------------------
>
>                 Key: CALCITE-1682
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1682
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.12.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>
> I am working on the integration of materialized view rewriting within Hive.
> Once a view matches an operator plan, rewriting is split vastly in two steps. 
> The first step will verify that the input to the root operator of the matched 
> plan is equivalent or contained within the input to the root operator of the 
> query representing the view. The second step will trigger a _unify_ rule, 
> which tries to rewrite the matched operator tree into a scan on the view and 
> possibly some additional operators to compute the exact results needed by the 
> query (think about Project that alters the column order, additional Filter on 
> the view, additional Join operation, etc.)
> If we focus on step 1, checking equivalence/containment, I would like to 
> extend the metadata providers in Calcite to give us more information about 
> the matched (sub)plan. In particular, I am thinking on:
> - Expression column origin. Currently Calcite can provide the column origins 
> for a certain column and whether it is derived or not. However, we would need 
> to obtain the expression that generated a certain column. This expression 
> should contain references to the input tables. For instance, given expression 
> column _c_, the new md provider would return that it was generated by 
> expression _A.a + B.b_. 
> - All predicates. Currently Calcite can extract predicates that have been 
> applied on an RelNode output (we can think on them as constraints on the 
> output). However, I would like to extract all predicates that have been 
> applied on a given RelNode (sub)plan. Since nodes might not be part of the 
> output, expressions should contain references to the input tables. For 
> instance, the new md provider might return the expressions _A.a + B.b > C.c 
> AND D.d = 100_.
> - PK-FK relationship. I do not plan to implement this one immediately. 
> However, exposing this information (given it is provided) can help us to 
> trigger more rewriting containing join operators. Thus, I was wondering if it 
> is worth adding it.
> Once this information is available, we can rely on it to implement logic 
> similar to [1] to check whether a given (sub)plan is equivalent/contained 
> within a given view.
> One question I have is about representing the table columns as a RexNode, as 
> I think it is the easiest way to be returned by the new metadata providers. I 
> checked _RexPatternFieldRef_ and I think it will meet our requirements: alpha 
> would be the qualified table name, while the index is the column idx for the 
> table. Thoughts?
> I have started working on this and will provide a patch shortly; feedback is 
> greatly appreciated.
> [1] 
> ftp://ftp10.us.freebsd.org/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to