Hi John,

Thanks for the interest. Responses are inlined.

Regards,
Amogh

On Fri, Jun 26, 2015 at 3:13 AM, John Pullokkaran <
[email protected]> wrote:

> Amogh,
>
>    It would be nice if you can share the commit id.
>

>>
https://github.com/qubole/incubator-calcite/commit/19c1a70585cf9691d0e380c7fb85b7deaa86ad3d
>>
https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10

>
>
> 1. #2.b "This also involves executing RexNode using RexImplExecutor²
> Wondering why expr needs to be evaluated? Is it cover functions involving
> literals.

>>  To find out if x>30 => x>10 we evaluate x>10 by replacing x by 30. But
we check lot of stuff to ensure this doesnot give false negatives.

>
> 2. how is expr ordering issue solved?
> Example: dept != 10 vs 10 != dept
>
>> For the current set of binary operators we are handling this was
straight forward. See  the second commit for it:
>>
https://github.com/qubole/incubator-calcite/commit/35b1ac3b2d1ce44fa6bd9f72ae8e329555276a10

>
> 3. Column aliasing:
> Select sales as a from t1 order by a
> Vs
> Select sales as b from t1  order by b
>

>> We are planning to test around it. Gut feeling is that it should be
handled
>> as we do operations on RelNodes that I believe would have resolved
aliasing.

>
> Thanks
> John
>
> On 6/25/15, 12:35 PM, "Julian Hyde" <[email protected]> wrote:
>
> >This is great work. Certainly consistent with where I am heading.
> >
> >I would not be inclined to use DNF (because of its tendency to inflate
> >certain predicates) but if you are able to get something effective I
> >will happily use it. I think you should package it behind a method --
> >"find out what is left to satisfy p when you have already satisfied q"
> >or something -- and write lots of tests of that method, and it doesn't
> >really matter what algorithm is behind it.
> >
> >Take a look at SubstitutionVisitor.simplfy(RexNode) and how it focuses
> >on finding whether
> >
> >  p1 AND p2 AND p3 AND NOT (q1 AND q2 AND q3)
> >
> >is satisfiable.
> >
> >Later we will want to know not just "can I satisfy query Q using
> >materialization M?" but "can I satisfy part of Q using M, and what is
> >left over?". I can convert most of Q to use an aggregate table over
> >years 2012 .. 2014 and 2015 Jan - May, and then scan the raw data for
> >June 1st onwards, that is a big win.
> >
> >What branch are you working on? Your master branch
> >https://github.com/qubole/incubator-calcite/tree/master seems to be
> >the same as apache/master right now.
> >
> >If you can divide this work into pull requests with unit tests, I will
> >happy commit each change as you make progress.
> >
> >By the way, I logged a few jira cases connected to materialized view
> >rewrite today. They were motivated by the phoenix team wanting to use
> >secondary indexes. But they could by applied to any scan-project-sort
> >materialization. See
> >
> >* https://issues.apache.org/jira/browse/CALCITE-771
> >* https://issues.apache.org/jira/browse/CALCITE-772
> >* https://issues.apache.org/jira/browse/CALCITE-773
> >
> >Julian
> >
> >On Thu, Jun 25, 2015 at 11:46 AM, Amogh Margoor <[email protected]>
> wrote:
> >> Hi,
> >> We were working on a problem to detect if materialized view can be used
> >>to
> >> rewrite a query in non-trivial cases. Will briefly describe the problem
> >>and
> >> approach below and would appreciate feedback on the same.
> >>
> >> Motivation
> >> ---------------
> >> For instance there exists a table "logs" and a partition (materialized
> >> view)  named "log_after_01_Jan" created on it and described by SQL :
> >> "Select * from logs where log_date > '01-01-2015' ".
> >>
> >> Assume that the table "log_after_01_Jan" is much smaller than  table
> >>"logs".
> >>
> >>  For user query:
> >> "Select log_body from logs where log_date > '03-03-2015' and
> >> char_length(log_body) < 100",
> >> we should detect that the materialized view "log_after_01_Jan" can be
> >>used
> >> and transform the query into:
> >>
> >> "Select log_body from log_after_01_Jan where log_date > '03-03-2015' and
> >> char_length(log_body) < 100"
> >>
> >> Approach
> >> --------------
> >> One of the fundamental problems we would come across here is to check
> >>if a
> >> boolean condition X implies (=>) Y. This quickly reduces to the
> >> Satisfiability problem which is NP complete for propositional logic. But
> >> there are many instances like above which can be detected easily. We
> >>have
> >> implemented an approach to handle several useful cases for few operators
> >> and types of operands. Will be extending it further for more types of
> >> operations.
> >>
> >> Top Level approach:
> >>
> >> 1. Currently, VolcanoPlanner:useMaterialization tries to rewrite
> >>original
> >> query using MV using SubstitutionVisitor. Have extended
> >>SubstitutionVisitor
> >> to detect above cases and do the substitution.
> >>
> >> 2. To check if a condition X => Y,
> >>    a. Convert both of them into Disjunctive Normal Form.
> >>        Say X is transformed into  x_1 or x_2 or x_3 ... or x_m and
> >>        Y is transformed into y_1 or y_2 ,... or  y_i, where any x_i and
> >>y_i
> >> are conjunctions of atomic predicates.
> >>        For instance condition "(a>10 or b>20) and c <90" will be
> >>converted
> >> to DNF: (a>10 and c<90)  or (b>20 and c<90).
> >>
> >>    b. For X=>Y to be a tautology i.e., hold always true, every
> >>conjunction
> >> x_i should imply atleast one of the conjunction y_j.
> >>        We wrote some set of simple heuristics to check if a conjunction
> >>of
> >> atomic predicates implies another.
> >>       This also involves executing RexNode using RexImplExecutor.
> >>
> >> We have checked in code for this in our fork of
> >> calcite(qubole/incubator-calcite). This is ongoing work and we will be
> >> making many more improvements to it. If this is useful or anybody is
> >> interested in giving feedback then I can share the commit so that we can
> >> discuss about it and take it forward.
> >>
> >> Regards,
> >> Amogh
> >> Member of Technical Staff
> >> Qubole
> >
>
>

Reply via email to