Hi John, Thanks for the interest. Responses are inlined.
Regards, Amogh On Fri, Jun 26, 2015 at 3:13 AM, John Pullokkaran < [email protected]> wrote: > Amogh, > > It would be nice if you can share the commit id. > >> https://github.com/qubole/incubator-calcite/commit/19c1a70585cf9691d0e380c7fb85b7deaa86ad3d >> https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10 > > > 1. #2.b "This also involves executing RexNode using RexImplExecutor² > Wondering why expr needs to be evaluated? Is it cover functions involving > literals. >> To find out if x>30 => x>10 we evaluate x>10 by replacing x by 30. But we check lot of stuff to ensure this doesnot give false negatives. > > 2. how is expr ordering issue solved? > Example: dept != 10 vs 10 != dept > >> For the current set of binary operators we are handling this was straight forward. See the second commit for it: >> https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10 > > 3. Column aliasing: > Select sales as a from t1 order by a > Vs > Select sales as b from t1 order by b > >> We are planning to test around it. Gut feeling is that it should be handled >> as we do operations on RelNodes that I believe would have resolved aliasing. > > Thanks > John > > On 6/25/15, 12:35 PM, "Julian Hyde" <[email protected]> wrote: > > >This is great work. Certainly consistent with where I am heading. > > > >I would not be inclined to use DNF (because of its tendency to inflate > >certain predicates) but if you are able to get something effective I > >will happily use it. I think you should package it behind a method -- > >"find out what is left to satisfy p when you have already satisfied q" > >or something -- and write lots of tests of that method, and it doesn't > >really matter what algorithm is behind it. > > > >Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses > >on finding whether > > > > p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3) > > > >is satisfiable. > > > >Later we will want to know not just "can I satisfy query Q using > >materialization M?" but "can I satisfy part of Q using M, and what is > >left over?". I can convert most of Q to use an aggregate table over > >years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for > >June 1st onwards, that is a big win. > > > >What branch are you working on? Your master branch > >https://github.com/qubole/incubator-calcite/tree/master seems to be > >the same as apache/master right now. > > > >If you can divide this work into pull requests with unit tests, I will > >happy commit each change as you make progress. > > > >By the way, I logged a few jira cases connected to materialized view > >rewrite today. They were motivated by the phoenix team wanting to use > >secondary indexes. But they could by applied to any scan-project-sort > >materialization. See > > > >* https://issues.apache.org/jira/browse/CALCITE-771 > >* https://issues.apache.org/jira/browse/CALCITE-772 > >* https://issues.apache.org/jira/browse/CALCITE-773 > > > >Julian > > > >On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <[email protected]> > wrote: > >> Hi, > >> We were working on a problem to detect if materialized view can be used > >>to > >> rewrite a query in non-trivial cases. Will briefly describe the problem > >>and > >> approach below and would appreciate feedback on the same. > >> > >> Motivation > >> --------------- > >> For instance there exists a table "logs" and a partition (materialized > >> view) named "log_after_01_Jan" created on it and described by SQL : > >> "Select * from logs where log_date > '01-01-2015' ". > >> > >> Assume that the table "log_after_01_Jan" is much smaller than table > >>"logs". > >> > >> For user query: > >> "Select log_body from logs where log_date > '03-03-2015' and > >> char_length(log_body) < 100", > >> we should detect that the materialized view "log_after_01_Jan" can be > >>used > >> and transform the query into: > >> > >> "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and > >> char_length(log_body) < 100" > >> > >> Approach > >> -------------- > >> One of the fundamental problems we would come across here is to check > >>if a > >> boolean condition X implies (=>) Y. This quickly reduces to the > >> Satisfiability problem which is NP complete for propositional logic. But > >> there are many instances like above which can be detected easily. We > >>have > >> implemented an approach to handle several useful cases for few operators > >> and types of operands. Will be extending it further for more types of > >> operations. > >> > >> Top Level approach: > >> > >> 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite > >>original > >> query using MV using SubstitutionVisitor. Have extended > >>SubstitutionVisitor > >> to detect above cases and do the substitution. > >> > >> 2. To check if a condition X => Y, > >> a. Convert both of them into Disjunctive Normal Form. > >> Say X is transformed into x_1 or x_2 or x_3 ... or x_m and > >> Y is transformed into y_1 or y_2 ,... or y_i, where any x_i and > >>y_i > >> are conjunctions of atomic predicates. > >> For instance condition "(a>10 or b>20) and c <90" will be > >>converted > >> to DNF: (a>10 and c<90) or (b>20 and c<90). > >> > >> b. For X=>Y to be a tautology i.e., hold always true, every > >>conjunction > >> x_i should imply atleast one of the conjunction y_j. > >> We wrote some set of simple heuristics to check if a conjunction > >>of > >> atomic predicates implies another. > >> This also involves executing RexNode using RexImplExecutor. > >> > >> We have checked in code for this in our fork of > >> calcite(qubole/incubator-calcite). This is ongoing work and we will be > >> making many more improvements to it. If this is useful or anybody is > >> interested in giving feedback then I can share the commit so that we can > >> discuss about it and take it forward. > >> > >> Regards, > >> Amogh > >> Member of Technical Staff > >> Qubole > > > >
