Jesus Camacho Rodriguez created CALCITE-1731:
------------------------------------------------

             Summary: Rewriting of queries using materialized views with joins 
and aggregates
                 Key: CALCITE-1731
                 URL: https://issues.apache.org/jira/browse/CALCITE-1731
             Project: Calcite
          Issue Type: New Feature
          Components: core
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez
             Fix For: 1.13.0


The idea is still to build a rewriting approach similar to:
ftp://ftp.cse.buffalo.edu/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf

I tried to build on CALCITE-1389 work. However, finally I ended up creating a 
new alternative rule. The main reason is that I wanted to follow the paper more 
closely and not rely on triggering rules within the MV rewriting to find 
whether expressions are equivalent. Instead, we extract information from the 
query plan and the MVs plans using the new metadata providers proposed in 
CALCITE-1682, and then we use that information to validate and execute the 
rewriting.

I also implemented new unifying/rewriting logic within the rule, since existing 
unifying rules for aggregates were assuming that aggregate inputs in the query 
and the MV needed to be equivalent (same Volcano node). That condition can be 
relaxed because we verify in the rule, by using the new metadata providers as 
stated above, that the result for the query is contained within the MV.

I added multiple tests, but any feedback pointing to new tests that could be 
added to check correctness/coverage is welcome.

Algorithm can trigger multiple rewritings for the same query node. In addition, 
support for multiple usages of tables in query/MVs is supported.

A few extensions that will follow this issue:
* Extend logic to filter relevant MVs for a given query node, so approach is 
scalable as number of MVs grows.
* Produce rewritings using Union operators, e.g., a given query could be 
partially answered from the MV (_year = 2014_) and from the query 
(_not(year=2014)_). If the MV is stored e.g. in Druid, this rewriting might be 
beneficial. As with the other rewritings, decision on whether to finally use 
the rewriting should be cost-based.
* Currently query and MV must use the same tables. This logic can be extended:
- Firstly, if query uses an additional table than MV, we can produce a 
rewriting that joins the MV with that additional table (given that join keys 
are available in the MV and we can compute all output columns).
- Second, if MV uses more tables than the query, we can recognize the 
cardinality preserving joins to just project columns out of the MV and use it 
in the rewriting.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to