Re: Cosette / Apache Calcite

Shumo Chu Tue, 17 Oct 2017 17:07:40 -0700

Hi Michael and Calcite Devs:

That's great!


We also hijacked the test harness of RelOpRule Test and generated SQL using
before and after optimization logical plans. (The reason that I use query
plan rather than the input SQL is that when Calcite generates SQLs from
query plans, it adds the correct table reference of attributes. Cosette
currently cannot handle unqualified attribute references). We got 232
before and after SQL after some tweaking. You can find the SQL queries that
we get here: https://github.com/uwdb/Cosette/blob/master/examples/c
alcite/calcite_tests.json.

We then ran all of them using Cosette. Cosette can currently support 39 of
232 queries. Out of these 39 queries, the results are:

EQ           9
UNKNOWN     27
NEQ          3

EQ means our Coq backend found a valid proof for the equivalence of the two
queries.

UNKNOWN means our model checker could not find a counterexample to prove
that the queries are not equivalent (within a time bound of 3 secs). You
can interpret that as "likely equal".

NEQ means our model checker found a counterexample to prove the queries are
not equivalent. These are all the cases that there are some preconditions
that Calcite understands but we haven't yet put into Cosette. For example
(testPushSemiJoinPastJoinRuleLeft):

 query q1 `SELECT EMP.ENAME
                  FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
                   WHERE EMP.DEPTNO = DEPT.DEPTNO AND
                                 EMP.EMPNO = EMP0.EMPNO`;

query q2 `SELECT EMP1.ENAME
                 FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
                    ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP AS EMP2
                    ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT AS DEPT1
                    ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP AS EMP3
                    ON EMP1.EMPNO = EMP3.EMPNO`;


These two queries are equivalent under the precondition that EMPNO is the
primary key of EMP, and DEPTNO is the primary key of DEPT, and that is
indeed the case!

We are extending Cosette to support preconditions such as primary keys and
foreign keys now and we should be able to handle these soon.

For the queries that we cannot handle, here is the break down of the issues:

[image: Inline image 1]

We support a large part of group by queries, GROUPBY here means some
grouping features that we don't support, e.g., group by on arbitrary
expressions. Although we might not be able to support all of these SQL
features, we are definitely working on adding more SQL features to Cosette.

With all above, we can try to integrate Cosette as an extra layer to ensure
developer's confidence. Currently, we are working on SIGMOD deadline (Nov. 2)
but will devote more time into this afterward.

One thing worth mentioning is that we plan to include all these calcite
examples as the regression test for Cosette so that we can make Cosette
more powerful and practical. These queries are great benchmarks for Cosette
and are super valuable!

Meanwhile, It would be also great if calcite's logical plan to SQL
converter can be improved. It currently doesn't support all queries and in
some cases, it actually generates illegal SQL. (happy to submit detailed
bug report if you think that would be helpful).

Best
Shumo

On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <[email protected]> wrote:

> This took me longer than expected to get around to, but hopefully the below
> is helpful:
>
> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
>
> I just did some basic (and very hacky) instrumentation of RelOptRulesTest
> to dump SQL before and after rules have been applied. The file consists of
> the name of the test followed by the original and then the rewritten SQL.
>
> Many of the tests are missing for various reasons, but there's still 189
> examples there to play with. Let me know if any particular aspects of the
> SQL are problematic. The "before" SQL is handwritten for the tests and the
> "after" is ANSI SQL as generated by Calcite from the resulting logical
> plan.
>
> --
> Michael Mior
> [email protected]
>
> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <[email protected]>:
>
> > >>> There might be applications in materialized views. A query Q can use
> a
> > materialized view V if V covers Q. In other words if >>>Q == R(V) where R
> > is some sequence of relational operators. Given Q and V, Cosette could
> > perhaps analyze and either >>>return R (success) or return that V does
> not
> > cover Q (failure).
> >
> > >>This resembles the problem of deciding whether a given relation
> > (expressed as a query) is contained in another one. It will >>take some
> > work for Cosette to be able to handle this but it definitely sounds
> > interesting. Do you have an application in mind? >>One of them might be
> to
> > determine whether previously cached results can be used.
> >
> > One simple idea to start here is to replace a naive solver we have in
> > Calcite for checking if one predicate implies another predicate. We call
> it
> > RexImplicationChecker in Calcite and if we can replace or help it with
> > Constraint solver of Cosette which says if a particular implication is a
> > tautology then that would help a great deal.
> >
> >
> >
> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> [email protected]>
> > wrote:
> >
> > > Hi Julian et al,
> > >
> > > Thanks for your interest in Cosette. Your suggestions make a lot of
> > sense.
> > > We have done some initial work and would like to get your feedback on
> how
> > > to integrate the two tools together.
> > >
> > > > One obvious idea is to use Cosette to audit Calcite’s query
> > > transformation rules. Each rule is supposed to preserve semantics but
> > > (until Cosette) we had to trust the author of the rule. We could
> convert
> > > the before and after relational expressions to SQL, and then ask
> Cosette
> > > whether those are equivalent. We could enable this check in Calcite’s
> > test
> > > suite, during which many thousands of rules are fired.
> > >
> > > Indeed. We have browsed through the Calcite rules and reformulated a
> few
> > > of them using our Cosette language:
> > >
> > > 1. Conjunctive select (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.
> washington.edu/
> > > (click conjunctive select from the dropdown menu)
> > >
> > > 2. Join commute (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/JoinCommuteRule.java) --> Join commute from the demo website
> above
> > >
> > > 3. Join/Project transpose (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans. from the
> demo
> > > website above
> > >
> > > As we are not very familiar with the Calcite code base, we have tried
> our
> > > best to guess the intention of each rule based on the documentation,
> > please
> > > feel free to point out if we made mistakes.
> > >
> > > As you can see, the Cosette language is pretty much like standard SQL,
> > > except for declarations of schemas and relations. You will also notice
> > the
> > > "??" in some schema declarations (e.g., in rule 1. above) --- they
> stand
> > > for "symbolic" attributes that can represent any attribute. In other
> > words,
> > > if Cosette can prove that a rule with symbolic attributes is true, then
> > it
> > > will be true regardless of what the symbolic attributes are
> instantiated
> > > with. Symbolic predicates (e.g., in rule 1.) works similarly, hence
> > giving
> > > Cosette a mechanism to prove (or disprove) classes of rewrite rules.
> See
> > > our documentation at http://cosette.cs.washington.edu/guide for
> details.
> > >
> > > I believe the challenge here is how we can "reverse engineer" the
> > > intention of each of the existing rules so that they can be expressed
> in
> > > Cosette. Any suggestions on how to do this? We have a few students
> > working
> > > on Cosette and can help, but we will probably need help from Calcite to
> > > fully understand all of the existing rules. Another possibility is to
> > print
> > > out the input and output of each rule application during testing, and
> > send
> > > them to Cosette. If the printout is in a form that resembles SQL we can
> > > probably patch it into Cosette.
> > >
> > > For new rules, can we can ask Calcite authors to express them in
> Cosette
> > > as well, perhaps as part of the documentation? This way we will only
> need
> > > to handle the existing rules.
> > >
> > > > A few rules might use other information besides the input relational
> > > expression, such as predicates that are known to hold or column
> > > combinations that are known to be unique. But let’s see what happens.
> > >
> > > This is something that we are actively working on. Can you point us to
> > > specific rules with such properties? One possibility is the join
> > > commutativity rule noted above. You will notice that we didn't prove
> the
> > > "general form" of the rule with symbolic attributes, but rather one
> with
> > > concrete schemas. This is because Cosette currently implements the
> > unnamed
> > > approach to attribute naming (see Section 3.2 in
> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the general
> form
> > > of the rule is only true if we know that the two input schemas have
> > > distinct attributes.
> > >
> > > > This is a very loose integration of Cosette / Calcite, but we can
> make
> > > closer integrations (e.g. within the same JVM, even at runtime) as we
> > > discover synergies. After all, optimization and theorem-proving are
> > related
> > > endeavors.
> > >
> > > Agreed. Cosette is implemented using Coq and Racket. We realize that
> > those
> > > are not the most popular languages for implementing systems :) , so
> > Cosette
> > > comes with a POST API as well: http://cosette.cs.washington.
> > edu/guide#api
> > > . It takes in the program text written in Cosette, and returns the
> answer
> > > (or times out). Does this make it easier to run the tool? We are open
> to
> > > implementing other bindings as well.
> > >
> > > > Another area that would be useful would be to devise test data.
> > >
> > > How about this: Each SQL implementation has its own interpretation of
> > SQL,
> > > with Cosette being one of them. Let's implement different SQL semantics
> > > using Cosette (say, Calcite's and Postgres'). Then, given a query, ask
> > > Cosette to find a counterexample (i.e., an input relation) where the
> two
> > > implementations will return different results when executed on a given
> > > query. If such a counterexample exists, then Calcite developers can
> > > determine whether this is a "bug" or a "feature". Does this sound
> similar
> > > to what you have in mind?
> > >
> > > > There might be applications in materialized views. A query Q can use
> a
> > > materialized view V if V covers Q. In other words if Q == R(V) where R
> is
> > > some sequence of relational operators. Given Q and V, Cosette could
> > perhaps
> > > analyze and either return R (success) or return that V does not cover Q
> > > (failure).
> > >
> > > This resembles the problem of deciding whether a given relation
> > (expressed
> > > as a query) is contained in another one. It will take some work for
> > Cosette
> > > to be able to handle this but it definitely sounds interesting. Do you
> > have
> > > an application in mind? One of them might be to determine whether
> > > previously cached results can be used.
> > >
> > > We definitely see lots of synergies between the two tools. To start
> with
> > > something easy :) , I propose we first discuss how to use the current
> > > Cosette implementation to audit existing Calcite rules, and a way to
> > > integrate Cosette into development of future Calcite rules as part of
> > code
> > > review / regression tests. What do you think?
> > >
> > > Thanks,
> > > Alvin (on behalf of the Cosette team)
> > >
> >
>



-- 
shumochu.com

Re: Cosette / Apache Calcite

Reply via email to