Hi Julian,

Small update:
This are the commits done:
>>
https://github.com/qubole/incubator-calcite/commit/19c1a70585cf9691d0e380c7fb85b7deaa86ad3d
>>
https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10

Minor Correction in the earlier response:
I meant (x>30 => x >10) instead of (x >10 => x >30)

Regards,
Amogh


On Fri, Jun 26, 2015 at 10:11 AM, Amogh Margoor <[email protected]> wrote:

> Hi Julian,
>
> Thanks a lot Julian for your feedback. I have inlined my response below
> which also includes the commit done.
>
> Regards,
> Amogh
>
> On Fri, Jun 26, 2015 at 1:05 AM, Julian Hyde <[email protected]> wrote:
>
>> This is great work. Certainly consistent with where I am heading.
>>
>> I would not be inclined to use DNF (because of its tendency to inflate
>> certain predicates) but if you are able to get something effective I
>> will happily use it. I think you should package it behind a method --
>> "find out what is left to satisfy p when you have already satisfied q"
>> or something -- and write lots of tests of that method, and it doesn't
>> really matter what algorithm is behind it.
>>
>> Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
>> on finding whether
>>
>>   p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)
>>
>> is satisfiable.
>>
>
> >> I saw this method. I will try to use this in improvements to follow.
> >> It didnot seem to solve this currently: (x>10 => x>30)  i.e., find if
> >>  NOT (NOT(x>10 )  OR  x >30) is satisfiable.
> >> We have currently packaged it as "if X => Y" (see RexImplicationChecker
> >> in the commit I shared below), but agree it should be
> >> more generic like what you suggested above and something we can try to
> achieve.
>
>>
>> Later we will want to know not just "can I satisfy query Q using
>> materialization M?" but "can I satisfy part of Q using M, and what is
>> left over?". I can convert most of Q to use an aggregate table over
>> years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
>> June 1st onwards, that is a big win.
>>
>
> >> This certainly should be something we should aim at.
>
>>
>> What branch are you working on? Your master branch
>> https://github.com/qubole/incubator-calcite/tree/master seems to be
>> the same as apache/master right now.
>>
>> >> We work on https://github.com/qubole/incubator-calcite/tree/qds-1.3 .
> >> This is the commit:
> https://github.com/qubole/incubator-calcite/pull/1/files?diff=unified
> >> We are in the process of writing UTs for it. We did most of the testing
> through our client code till now.
> >> We have created new Visitor extending SustitutionVisitor because did
> not want to mess with the existing code.
> >> More rules need to be added to the new Visitor.
> >> Will raise a PR once UTs are added and testing is complete.
>
>
>> If you can divide this work into pull requests with unit tests, I will
>> happy commit each change as you make progress.
>>
>> By the way, I logged a few jira cases connected to materialized view
>> rewrite today. They were motivated by the phoenix team wanting to use
>> secondary indexes. But they could by applied to any scan-project-sort
>> materialization. See
>>
>> * https://issues.apache.org/jira/browse/CALCITE-771
>> * https://issues.apache.org/jira/browse/CALCITE-772
>> * https://issues.apache.org/jira/browse/CALCITE-773
>>
>
> >> Thanks for sharing this info Julian. Will definitely take a look.
>
>>
>> Julian
>>
>> On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <[email protected]>
>> wrote:
>> > Hi,
>> > We were working on a problem to detect if materialized view can be used
>> to
>> > rewrite a query in non-trivial cases. Will briefly describe the problem
>> and
>> > approach below and would appreciate feedback on the same.
>> >
>> > Motivation
>> > ---------------
>> > For instance there exists a table "logs" and a partition (materialized
>> > view)  named "log_after_01_Jan" created on it and described by SQL :
>> > "Select * from logs where log_date > '01-01-2015' ".
>> >
>> > Assume that the table "log_after_01_Jan" is much smaller than  table
>> "logs".
>> >
>> >  For user query:
>> > "Select log_body from logs where log_date > '03-03-2015' and
>> > char_length(log_body) < 100",
>> > we should detect that the materialized view "log_after_01_Jan" can be
>> used
>> > and transform the query into:
>> >
>> > "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
>> > char_length(log_body) < 100"
>> >
>> > Approach
>> > --------------
>> > One of the fundamental problems we would come across here is to check
>> if a
>> > boolean condition X implies (=>) Y. This quickly reduces to the
>> > Satisfiability problem which is NP complete for propositional logic. But
>> > there are many instances like above which can be detected easily. We
>> have
>> > implemented an approach to handle several useful cases for few operators
>> > and types of operands. Will be extending it further for more types of
>> > operations.
>> >
>> > Top Level approach:
>> >
>> > 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite
>> original
>> > query using MV using SubstitutionVisitor. Have extended
>> SubstitutionVisitor
>> > to detect above cases and do the substitution.
>> >
>> > 2. To check if a condition X => Y,
>> >    a. Convert both of them into Disjunctive Normal Form.
>> >        Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
>> >        Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and
>> y_i
>> > are conjunctions of atomic predicates.
>> >        For instance condition "(a>10 or b>20) and c <90" will be
>> converted
>> > to DNF: (a>10 and c<90)  or (b>20 and c<90).
>> >
>> >    b. For X=>Y to be a tautology i.e., hold always true, every
>> conjunction
>> > x_i should imply atleast one of the conjunction y_j.
>> >        We wrote some set of simple heuristics to check if a conjunction
>> of
>> > atomic predicates implies another.
>> >       This also involves executing RexNode using RexImplExecutor.
>> >
>> > We have checked in code for this in our fork of
>> > calcite(qubole/incubator-calcite). This is ongoing work and we will be
>> > making many more improvements to it. If this is useful or anybody is
>> > interested in giving feedback then I can share the commit so that we can
>> > discuss about it and take it forward.
>> >
>> > Regards,
>> > Amogh
>> > Member of Technical Staff
>> > Qubole
>>
>
>

Reply via email to