Hi Julian, Small update: This are the commits done: >> https://github.com/qubole/incubator-calcite/commit/19c1a70585cf9691d0e380c7fb85b7deaa86ad3d >> https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10
Minor Correction in the earlier response: I meant (x>30 => x >10) instead of (x >10 => x >30) Regards, Amogh On Fri, Jun 26, 2015 at 10:11 AM, Amogh Margoor <[email protected]> wrote: > Hi Julian, > > Thanks a lot Julian for your feedback. I have inlined my response below > which also includes the commit done. > > Regards, > Amogh > > On Fri, Jun 26, 2015 at 1:05 AM, Julian Hyde <[email protected]> wrote: > >> This is great work. Certainly consistent with where I am heading. >> >> I would not be inclined to use DNF (because of its tendency to inflate >> certain predicates) but if you are able to get something effective I >> will happily use it. I think you should package it behind a method -- >> "find out what is left to satisfy p when you have already satisfied q" >> or something -- and write lots of tests of that method, and it doesn't >> really matter what algorithm is behind it. >> >> Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses >> on finding whether >> >> p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3) >> >> is satisfiable. >> > > >> I saw this method. I will try to use this in improvements to follow. > >> It didnot seem to solve this currently: (x>10 => x>30) i.e., find if > >> NOT (NOT(x>10 ) OR x >30) is satisfiable. > >> We have currently packaged it as "if X => Y" (see RexImplicationChecker > >> in the commit I shared below), but agree it should be > >> more generic like what you suggested above and something we can try to > achieve. > >> >> Later we will want to know not just "can I satisfy query Q using >> materialization M?" but "can I satisfy part of Q using M, and what is >> left over?". I can convert most of Q to use an aggregate table over >> years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for >> June 1st onwards, that is a big win. >> > > >> This certainly should be something we should aim at. > >> >> What branch are you working on? Your master branch >> https://github.com/qubole/incubator-calcite/tree/master seems to be >> the same as apache/master right now. >> >> >> We work on https://github.com/qubole/incubator-calcite/tree/qds-1.3 . > >> This is the commit: > https://github.com/qubole/incubator-calcite/pull/1/files?diff=unified > >> We are in the process of writing UTs for it. We did most of the testing > through our client code till now. > >> We have created new Visitor extending SustitutionVisitor because did > not want to mess with the existing code. > >> More rules need to be added to the new Visitor. > >> Will raise a PR once UTs are added and testing is complete. > > >> If you can divide this work into pull requests with unit tests, I will >> happy commit each change as you make progress. >> >> By the way, I logged a few jira cases connected to materialized view >> rewrite today. They were motivated by the phoenix team wanting to use >> secondary indexes. But they could by applied to any scan-project-sort >> materialization. See >> >> * https://issues.apache.org/jira/browse/CALCITE-771 >> * https://issues.apache.org/jira/browse/CALCITE-772 >> * https://issues.apache.org/jira/browse/CALCITE-773 >> > > >> Thanks for sharing this info Julian. Will definitely take a look. > >> >> Julian >> >> On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <[email protected]> >> wrote: >> > Hi, >> > We were working on a problem to detect if materialized view can be used >> to >> > rewrite a query in non-trivial cases. Will briefly describe the problem >> and >> > approach below and would appreciate feedback on the same. >> > >> > Motivation >> > --------------- >> > For instance there exists a table "logs" and a partition (materialized >> > view) named "log_after_01_Jan" created on it and described by SQL : >> > "Select * from logs where log_date > '01-01-2015' ". >> > >> > Assume that the table "log_after_01_Jan" is much smaller than table >> "logs". >> > >> > For user query: >> > "Select log_body from logs where log_date > '03-03-2015' and >> > char_length(log_body) < 100", >> > we should detect that the materialized view "log_after_01_Jan" can be >> used >> > and transform the query into: >> > >> > "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and >> > char_length(log_body) < 100" >> > >> > Approach >> > -------------- >> > One of the fundamental problems we would come across here is to check >> if a >> > boolean condition X implies (=>) Y. This quickly reduces to the >> > Satisfiability problem which is NP complete for propositional logic. But >> > there are many instances like above which can be detected easily. We >> have >> > implemented an approach to handle several useful cases for few operators >> > and types of operands. Will be extending it further for more types of >> > operations. >> > >> > Top Level approach: >> > >> > 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite >> original >> > query using MV using SubstitutionVisitor. Have extended >> SubstitutionVisitor >> > to detect above cases and do the substitution. >> > >> > 2. To check if a condition X => Y, >> > a. Convert both of them into Disjunctive Normal Form. >> > Say X is transformed into x_1 or x_2 or x_3 ... or x_m and >> > Y is transformed into y_1 or y_2 ,... or y_i, where any x_i and >> y_i >> > are conjunctions of atomic predicates. >> > For instance condition "(a>10 or b>20) and c <90" will be >> converted >> > to DNF: (a>10 and c<90) or (b>20 and c<90). >> > >> > b. For X=>Y to be a tautology i.e., hold always true, every >> conjunction >> > x_i should imply atleast one of the conjunction y_j. >> > We wrote some set of simple heuristics to check if a conjunction >> of >> > atomic predicates implies another. >> > This also involves executing RexNode using RexImplExecutor. >> > >> > We have checked in code for this in our fork of >> > calcite(qubole/incubator-calcite). This is ongoing work and we will be >> > making many more improvements to it. If this is useful or anybody is >> > interested in giving feedback then I can share the commit so that we can >> > discuss about it and take it forward. >> > >> > Regards, >> > Amogh >> > Member of Technical Staff >> > Qubole >> > >
