[
https://issues.apache.org/jira/browse/CALCITE-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951117#comment-15951117
]
Jesus Camacho Rodriguez commented on CALCITE-1731:
--------------------------------------------------
[~julianhyde], thanks for the feedback. I have updated the PR and I answer to
your comments below:
* I have added the missing comments, trying to elaborate more on the new
metadata providers functioning.
* The paper cited above refers to them as _compensation_
predicates/expressions. If I remember correctly, I have seen them referred to
as _differential_ predicates/expressions too.
In the method description, the relation between condition, target, and residue
is defined as:
{{condition = target AND residue}}
In the example you mention, we have condition: x = 1, target: x = 1 OR z = 3.
Then the residue was: NOT (z = 3), which is not correct.
{code}
x = 1 <=> (x = 1 OR z = 3) AND NOT(z = 3)
x = 1 <=> (x = 1 AND NOT(z = 3)) OR (z = 3 AND NOT(z = 3))
x = 1 <=> (x = 1 AND NOT(z = 3)) (it does not hold)
{code}
However, current result (residue: x = 1) is correct:
{code}
x = 1 <=> (x = 1 OR z = 3) AND (x = 1) (it holds)
{code}
* Replaced qualifiedName in _RelTableRef_ by list of String; I do not know why
I was handling qualifiedName as a single String in the first place.
* Replaced _RelTableRef.identifier_ by _RelTableRef.entityNumber_. Is it better?
* I have added comments for _RexInputTableRef_ and _RelTableRef_ stating their
purpose. I also nested _RelTableRef_ within _RexInputTableRef_, probably a
better way of encapsulating them both. _RexInputTableRef_ is useful in the
context of provenance/lineage, thus it will not be used through the whole
planning phase. In fact, its usage is currently constrained to the new
rewriting rule. Any new rewriting rule/auxiliary method will be able to make
use of these new classes, but expressions generated by rewriting rules for the
planner should not contain them anymore.
* PR already contains new tests for _SubstitutionVisitor.splitFilter_ which is
the logic that was extended in SubstitutionVisitor (_splitOr_ was also changed
but it is called from _splitFilter_). Existing tests were also updated. In
addition, new rewriting rule relies on _splitFilter_ to generate compensation
predicates, thus it is indirectly tested in MaterializationTest too.
* I have extended _materialized_views.md_.
> Rewriting of queries using materialized views with joins and aggregates
> -----------------------------------------------------------------------
>
> Key: CALCITE-1731
> URL: https://issues.apache.org/jira/browse/CALCITE-1731
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Fix For: 1.13.0
>
>
> The idea is still to build a rewriting approach similar to:
> ftp://ftp.cse.buffalo.edu/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf
> I tried to build on CALCITE-1389 work. However, finally I ended up creating a
> new alternative rule. The main reason is that I wanted to follow the paper
> more closely and not rely on triggering rules within the MV rewriting to find
> whether expressions are equivalent. Instead, we extract information from the
> query plan and the MVs plans using the new metadata providers proposed in
> CALCITE-1682, and then we use that information to validate and execute the
> rewriting.
> I also implemented new unifying/rewriting logic within the rule, since
> existing unifying rules for aggregates were assuming that aggregate inputs in
> the query and the MV needed to be equivalent (same Volcano node). That
> condition can be relaxed because we verify in the rule, by using the new
> metadata providers as stated above, that the result for the query is
> contained within the MV.
> I added multiple tests, but any feedback pointing to new tests that could be
> added to check correctness/coverage is welcome.
> Algorithm can trigger multiple rewritings for the same query node. In
> addition, support for multiple usages of tables in query/MVs is supported.
> A few extensions that will follow this issue:
> * Extend logic to filter relevant MVs for a given query node, so approach is
> scalable as number of MVs grows.
> * Produce rewritings using Union operators, e.g., a given query could be
> partially answered from the MV (_year = 2014_) and from the query
> (_not(year=2014)_). If the MV is stored e.g. in Druid, this rewriting might
> be beneficial. As with the other rewritings, decision on whether to finally
> use the rewriting should be cost-based.
> * Currently query and MV must use the same tables. This logic can be extended:
> - Firstly, if query uses an additional table than MV, we can produce a
> rewriting that joins the MV with that additional table (given that join keys
> are available in the MV and we can compute all output columns).
> - Second, if MV uses more tables than the query, we can recognize the
> cardinality preserving joins to just project columns out of the MV and use it
> in the rewriting.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)