Amogh, It would be nice if you can share the commit id.
1. #2.b "This also involves executing RexNode using RexImplExecutor² Wondering why expr needs to be evaluated? Is it cover functions involving literals. 2. how is expr ordering issue solved? Example: dept != 10 vs 10 != dept 3. Column aliasing: Select sales as a from t1 order by a Vs Select sales as b from t1 order by b Thanks John On 6/25/15, 12:35 PM, "Julian Hyde" <[email protected]> wrote: >This is great work. Certainly consistent with where I am heading. > >I would not be inclined to use DNF (because of its tendency to inflate >certain predicates) but if you are able to get something effective I >will happily use it. I think you should package it behind a method -- >"find out what is left to satisfy p when you have already satisfied q" >or something -- and write lots of tests of that method, and it doesn't >really matter what algorithm is behind it. > >Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses >on finding whether > > p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3) > >is satisfiable. > >Later we will want to know not just "can I satisfy query Q using >materialization M?" but "can I satisfy part of Q using M, and what is >left over?". I can convert most of Q to use an aggregate table over >years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for >June 1st onwards, that is a big win. > >What branch are you working on? Your master branch >https://github.com/qubole/incubator-calcite/tree/master seems to be >the same as apache/master right now. > >If you can divide this work into pull requests with unit tests, I will >happy commit each change as you make progress. > >By the way, I logged a few jira cases connected to materialized view >rewrite today. They were motivated by the phoenix team wanting to use >secondary indexes. But they could by applied to any scan-project-sort >materialization. See > >* https://issues.apache.org/jira/browse/CALCITE-771 >* https://issues.apache.org/jira/browse/CALCITE-772 >* https://issues.apache.org/jira/browse/CALCITE-773 > >Julian > >On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <[email protected]> wrote: >> Hi, >> We were working on a problem to detect if materialized view can be used >>to >> rewrite a query in non-trivial cases. Will briefly describe the problem >>and >> approach below and would appreciate feedback on the same. >> >> Motivation >> --------------- >> For instance there exists a table "logs" and a partition (materialized >> view) named "log_after_01_Jan" created on it and described by SQL : >> "Select * from logs where log_date > '01-01-2015' ". >> >> Assume that the table "log_after_01_Jan" is much smaller than table >>"logs". >> >> For user query: >> "Select log_body from logs where log_date > '03-03-2015' and >> char_length(log_body) < 100", >> we should detect that the materialized view "log_after_01_Jan" can be >>used >> and transform the query into: >> >> "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and >> char_length(log_body) < 100" >> >> Approach >> -------------- >> One of the fundamental problems we would come across here is to check >>if a >> boolean condition X implies (=>) Y. This quickly reduces to the >> Satisfiability problem which is NP complete for propositional logic. But >> there are many instances like above which can be detected easily. We >>have >> implemented an approach to handle several useful cases for few operators >> and types of operands. Will be extending it further for more types of >> operations. >> >> Top Level approach: >> >> 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite >>original >> query using MV using SubstitutionVisitor. Have extended >>SubstitutionVisitor >> to detect above cases and do the substitution. >> >> 2. To check if a condition X => Y, >> a. Convert both of them into Disjunctive Normal Form. >> Say X is transformed into x_1 or x_2 or x_3 ... or x_m and >> Y is transformed into y_1 or y_2 ,... or y_i, where any x_i and >>y_i >> are conjunctions of atomic predicates. >> For instance condition "(a>10 or b>20) and c <90" will be >>converted >> to DNF: (a>10 and c<90) or (b>20 and c<90). >> >> b. For X=>Y to be a tautology i.e., hold always true, every >>conjunction >> x_i should imply atleast one of the conjunction y_j. >> We wrote some set of simple heuristics to check if a conjunction >>of >> atomic predicates implies another. >> This also involves executing RexNode using RexImplExecutor. >> >> We have checked in code for this in our fork of >> calcite(qubole/incubator-calcite). This is ongoing work and we will be >> making many more improvements to it. If this is useful or anybody is >> interested in giving feedback then I can share the commit so that we can >> discuss about it and take it forward. >> >> Regards, >> Amogh >> Member of Technical Staff >> Qubole >
