Re: Cosette / Apache Calcite

Michael Mior Thu, 21 Sep 2017 12:33:38 -0700

There's is not a *tool* for conversion per-se, but
org.apache.calcite.rel.rel2sql.RelToSqlConverter provides an easy interface
which takes a RelNode (Calcite's description of a plan) as input and
produces SQL as output.


The schema for the test cases is defined programmatically in the init
method
of core/src/test/java/org/apache/calcite/test/MockCatalogReader.java. The
API should be fairly easy to follow.

No other rules involving constraints immediately to mind but perhaps one of
the veteran Calcite committers will have some suggestions.

--
Michael Mior
[email protected]

2017-09-21 3:50 GMT-04:00 Shumo Chu <[email protected]>:

> Thanks, Michael!
>
> We would like to try using Cosette to check the Calcite test cases in your
> link. Can you send us a pointer to the tool that converts Calcite query
> plans to SQL? Also, where is the schema for the test cases? Our plan is to
> call the tool to generate SQL from query plans, and then pass that along
> with the original SQL to Cosette to check. What do you think?
>
> In terms of the Cosette language document, you can find it here
> <http://cosette.cs.washington.edu/guide>. There is a section explaining
> the
> SQL features that we currently support. We will go through the rules in the
> test cases and see whether there are other features that we need to
> implement in Cosette.
>
> I have another question for you. We are extending Cosette to check rewrite
> rules in presence of constraints, such as keys and foreign keys. We found
> one such rule in Calcite's design document for materialized views:
> https://github.com/apache/calcite/blob/master/site/_docs/
> materialized_views.md. Would you know of other such cases in Calcite?
>
> On Tue, Sep 19, 2017 at 10:08 AM, Michael Mior <[email protected]> wrote:
>
> > Alvin,
> >
> > Sounds like an interesting proposal. While I'm certainly not as familiar
> > with the Calcite code base as Julian, I will chime in a little here. One
> > thing that may prove helpful is
> > core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml in
> the
> > Calcite source tree. This contains plans before and after different rules
> > have been applied.
> >
> > Although these are in the form of logical plans and not SQL, we have
> > existing machinery to convert plans back to SQL text. It seems like one
> > approach would be to mine the existing tests to get a pairs of SQL
> strings
> > for each existing rule being tested. Not sure this would be the best
> > approach long term, but it seems like a good way to get started. I'm
> happy
> > to try to hack together a script that could give this kind of output.
> >
> > I suspect part of this would include implementing what Alvin termed as a
> > Cosette interpretation of SQL (which Calcite refers to as a "dialect").
> If
> > this sounds interesting to others, I'll go ahead and see what I can come
> up
> > with.
> >
> > What's the best reference for the specific format of SQL accepted by
> > Cosette?
> >
> > --
> > Michael Mior
> > [email protected]
> >
> > 2017-09-19 11:59 GMT-04:00 Alvin Cheung <[email protected]>:
> >
> > > Hi Julian et al,
> > >
> > > Thanks for your interest in Cosette. Your suggestions make a lot of
> > sense.
> > > We have done some initial work and would like to get your feedback on
> how
> > > to integrate the two tools together.
> > >
> > > > One obvious idea is to use Cosette to audit Calcite’s query
> > > transformation rules. Each rule is supposed to preserve semantics but
> > > (until Cosette) we had to trust the author of the rule. We could
> convert
> > > the before and after relational expressions to SQL, and then ask
> Cosette
> > > whether those are equivalent. We could enable this check in Calcite’s
> > test
> > > suite, during which many thousands of rules are fired.
> > >
> > > Indeed. We have browsed through the Calcite rules and reformulated a
> few
> > > of them using our Cosette language:
> > >
> > > 1. Conjunctive select (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.
> washington.edu/
> > > (click conjunctive select from the dropdown menu)
> > >
> > > 2. Join commute (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/JoinCommuteRule.java) --> Join commute from the demo website
> above
> > >
> > > 3. Join/Project transpose (https://github.com/apache/cal
> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans. from the
> demo
> > > website above
> > >
> > > As we are not very familiar with the Calcite code base, we have tried
> our
> > > best to guess the intention of each rule based on the documentation,
> > please
> > > feel free to point out if we made mistakes.
> > >
> > > As you can see, the Cosette language is pretty much like standard SQL,
> > > except for declarations of schemas and relations. You will also notice
> > the
> > > "??" in some schema declarations (e.g., in rule 1. above) --- they
> stand
> > > for "symbolic" attributes that can represent any attribute. In other
> > words,
> > > if Cosette can prove that a rule with symbolic attributes is true, then
> > it
> > > will be true regardless of what the symbolic attributes are
> instantiated
> > > with. Symbolic predicates (e.g., in rule 1.) works similarly, hence
> > giving
> > > Cosette a mechanism to prove (or disprove) classes of rewrite rules.
> See
> > > our documentation at http://cosette.cs.washington.edu/guide for
> details.
> > >
> > > I believe the challenge here is how we can "reverse engineer" the
> > > intention of each of the existing rules so that they can be expressed
> in
> > > Cosette. Any suggestions on how to do this? We have a few students
> > working
> > > on Cosette and can help, but we will probably need help from Calcite to
> > > fully understand all of the existing rules. Another possibility is to
> > print
> > > out the input and output of each rule application during testing, and
> > send
> > > them to Cosette. If the printout is in a form that resembles SQL we can
> > > probably patch it into Cosette.
> > >
> > > For new rules, can we can ask Calcite authors to express them in
> Cosette
> > > as well, perhaps as part of the documentation? This way we will only
> need
> > > to handle the existing rules.
> > >
> > > > A few rules might use other information besides the input relational
> > > expression, such as predicates that are known to hold or column
> > > combinations that are known to be unique. But let’s see what happens.
> > >
> > > This is something that we are actively working on. Can you point us to
> > > specific rules with such properties? One possibility is the join
> > > commutativity rule noted above. You will notice that we didn't prove
> the
> > > "general form" of the rule with symbolic attributes, but rather one
> with
> > > concrete schemas. This is because Cosette currently implements the
> > unnamed
> > > approach to attribute naming (see Section 3.2 in
> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the general
> form
> > > of the rule is only true if we know that the two input schemas have
> > > distinct attributes.
> > >
> > > > This is a very loose integration of Cosette / Calcite, but we can
> make
> > > closer integrations (e.g. within the same JVM, even at runtime) as we
> > > discover synergies. After all, optimization and theorem-proving are
> > related
> > > endeavors.
> > >
> > > Agreed. Cosette is implemented using Coq and Racket. We realize that
> > those
> > > are not the most popular languages for implementing systems :) , so
> > Cosette
> > > comes with a POST API as well: http://cosette.cs.washington.
> > edu/guide#api
> > > . It takes in the program text written in Cosette, and returns the
> answer
> > > (or times out). Does this make it easier to run the tool? We are open
> to
> > > implementing other bindings as well.
> > >
> > > > Another area that would be useful would be to devise test data.
> > >
> > > How about this: Each SQL implementation has its own interpretation of
> > SQL,
> > > with Cosette being one of them. Let's implement different SQL semantics
> > > using Cosette (say, Calcite's and Postgres'). Then, given a query, ask
> > > Cosette to find a counterexample (i.e., an input relation) where the
> two
> > > implementations will return different results when executed on a given
> > > query. If such a counterexample exists, then Calcite developers can
> > > determine whether this is a "bug" or a "feature". Does this sound
> similar
> > > to what you have in mind?
> > >
> > > > There might be applications in materialized views. A query Q can use
> a
> > > materialized view V if V covers Q. In other words if Q == R(V) where R
> is
> > > some sequence of relational operators. Given Q and V, Cosette could
> > perhaps
> > > analyze and either return R (success) or return that V does not cover Q
> > > (failure).
> > >
> > > This resembles the problem of deciding whether a given relation
> > (expressed
> > > as a query) is contained in another one. It will take some work for
> > Cosette
> > > to be able to handle this but it definitely sounds interesting. Do you
> > have
> > > an application in mind? One of them might be to determine whether
> > > previously cached results can be used.
> > >
> > > We definitely see lots of synergies between the two tools. To start
> with
> > > something easy :) , I propose we first discuss how to use the current
> > > Cosette implementation to audit existing Calcite rules, and a way to
> > > integrate Cosette into development of future Calcite rules as part of
> > code
> > > review / regression tests. What do you think?
> > >
> > > Thanks,
> > > Alvin (on behalf of the Cosette team)
> > >
> >
>
>
>
> --
> shumochu.com
>

Re: Cosette / Apache Calcite

Reply via email to