Hi Julian et al,
Thanks for your interest in Cosette. Your suggestions make a lot of
sense. We have done some initial work and would like to get your
feedback on how to integrate the two tools together.
> One obvious idea is to use Cosette to audit Calcite’s query
transformation rules. Each rule is supposed to preserve semantics but
(until Cosette) we had to trust the author of the rule. We could convert
the before and after relational expressions to SQL, and then ask Cosette
whether those are equivalent. We could enable this check in Calcite’s
test suite, during which many thousands of rules are fired.
Indeed. We have browsed through the Calcite rules and reformulated a few
of them using our Cosette language:
1. Conjunctive select
(https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/FilterMergeRule.java)
--> https://demo.cosette.cs.washington.edu/ (click conjunctive select
from the dropdown menu)
2. Join commute
(https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/JoinCommuteRule.java)
--> Join commute from the demo website above
3. Join/Project transpose
(https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/JoinProjectTransposeRule.java)
--> Join Proj. Trans. from the demo website above
As we are not very familiar with the Calcite code base, we have tried
our best to guess the intention of each rule based on the documentation,
please feel free to point out if we made mistakes.
As you can see, the Cosette language is pretty much like standard SQL,
except for declarations of schemas and relations. You will also notice
the "??" in some schema declarations (e.g., in rule 1. above) --- they
stand for "symbolic" attributes that can represent any attribute. In
other words, if Cosette can prove that a rule with symbolic attributes
is true, then it will be true regardless of what the symbolic attributes
are instantiated with. Symbolic predicates (e.g., in rule 1.) works
similarly, hence giving Cosette a mechanism to prove (or disprove)
classes of rewrite rules. See our documentation at
http://cosette.cs.washington.edu/guide for details.
I believe the challenge here is how we can "reverse engineer" the
intention of each of the existing rules so that they can be expressed in
Cosette. Any suggestions on how to do this? We have a few students
working on Cosette and can help, but we will probably need help from
Calcite to fully understand all of the existing rules. Another
possibility is to print out the input and output of each rule
application during testing, and send them to Cosette. If the printout is
in a form that resembles SQL we can probably patch it into Cosette.
For new rules, can we can ask Calcite authors to express them in Cosette
as well, perhaps as part of the documentation? This way we will only
need to handle the existing rules.
> A few rules might use other information besides the input relational
expression, such as predicates that are known to hold or column
combinations that are known to be unique. But let’s see what happens.
This is something that we are actively working on. Can you point us to
specific rules with such properties? One possibility is the join
commutativity rule noted above. You will notice that we didn't prove the
"general form" of the rule with symbolic attributes, but rather one with
concrete schemas. This is because Cosette currently implements the
unnamed approach to attribute naming (see Section 3.2 in
http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the general form
of the rule is only true if we know that the two input schemas have
distinct attributes.
> This is a very loose integration of Cosette / Calcite, but we can
make closer integrations (e.g. within the same JVM, even at runtime) as
we discover synergies. After all, optimization and theorem-proving are
related endeavors.
Agreed. Cosette is implemented using Coq and Racket. We realize that
those are not the most popular languages for implementing systems :) ,
so Cosette comes with a POST API as well:
http://cosette.cs.washington.edu/guide#api . It takes in the program
text written in Cosette, and returns the answer (or times out). Does
this make it easier to run the tool? We are open to implementing other
bindings as well.
> Another area that would be useful would be to devise test data.
How about this: Each SQL implementation has its own interpretation of
SQL, with Cosette being one of them. Let's implement different SQL
semantics using Cosette (say, Calcite's and Postgres'). Then, given a
query, ask Cosette to find a counterexample (i.e., an input relation)
where the two implementations will return different results when
executed on a given query. If such a counterexample exists, then Calcite
developers can determine whether this is a "bug" or a "feature". Does
this sound similar to what you have in mind?
> There might be applications in materialized views. A query Q can use
a materialized view V if V covers Q. In other words if Q == R(V) where R
is some sequence of relational operators. Given Q and V, Cosette could
perhaps analyze and either return R (success) or return that V does not
cover Q (failure).
This resembles the problem of deciding whether a given relation
(expressed as a query) is contained in another one. It will take some
work for Cosette to be able to handle this but it definitely sounds
interesting. Do you have an application in mind? One of them might be to
determine whether previously cached results can be used.
We definitely see lots of synergies between the two tools. To start with
something easy :) , I propose we first discuss how to use the current
Cosette implementation to audit existing Calcite rules, and a way to
integrate Cosette into development of future Calcite rules as part of
code review / regression tests. What do you think?
Thanks,
Alvin (on behalf of the Cosette team)