Re: Cosette / Apache Calcite

Alvin Cheung Tue, 19 Sep 2017 08:59:50 -0700

Hi Julian et al,

Thanks for your interest in Cosette. Your suggestions make a lot ofsense. We have done some initial work and would like to get yourfeedback on how to integrate the two tools together.

> One obvious idea is to use Cosette to audit Calcite’s querytransformation rules. Each rule is supposed to preserve semantics but(until Cosette) we had to trust the author of the rule. We could convertthe before and after relational expressions to SQL, and then ask Cosettewhether those are equivalent. We could enable this check in Calcite’stest suite, during which many thousands of rules are fired.

Indeed. We have browsed through the Calcite rules and reformulated a fewof them using our Cosette language:

1. Conjunctive select(https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/FilterMergeRule.java)--> https://demo.cosette.cs.washington.edu/ (click conjunctive selectfrom the dropdown menu)

2. Join commute(https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/JoinCommuteRule.java)--> Join commute from the demo website above

3. Join/Project transpose(https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/JoinProjectTransposeRule.java)--> Join Proj. Trans. from the demo website above

As we are not very familiar with the Calcite code base, we have triedour best to guess the intention of each rule based on the documentation,please feel free to point out if we made mistakes.

As you can see, the Cosette language is pretty much like standard SQL,except for declarations of schemas and relations. You will also noticethe "??" in some schema declarations (e.g., in rule 1. above) --- theystand for "symbolic" attributes that can represent any attribute. Inother words, if Cosette can prove that a rule with symbolic attributesis true, then it will be true regardless of what the symbolic attributesare instantiated with. Symbolic predicates (e.g., in rule 1.) workssimilarly, hence giving Cosette a mechanism to prove (or disprove)classes of rewrite rules. See our documentation athttp://cosette.cs.washington.edu/guide for details.

I believe the challenge here is how we can "reverse engineer" theintention of each of the existing rules so that they can be expressed inCosette. Any suggestions on how to do this? We have a few studentsworking on Cosette and can help, but we will probably need help fromCalcite to fully understand all of the existing rules. Anotherpossibility is to print out the input and output of each ruleapplication during testing, and send them to Cosette. If the printout isin a form that resembles SQL we can probably patch it into Cosette.

For new rules, can we can ask Calcite authors to express them in Cosetteas well, perhaps as part of the documentation? This way we will onlyneed to handle the existing rules.

> A few rules might use other information besides the input relationalexpression, such as predicates that are known to hold or columncombinations that are known to be unique. But let’s see what happens.

This is something that we are actively working on. Can you point us tospecific rules with such properties? One possibility is the joincommutativity rule noted above. You will notice that we didn't prove the"general form" of the rule with symbolic attributes, but rather one withconcrete schemas. This is because Cosette currently implements theunnamed approach to attribute naming (see Section 3.2 inhttp://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the general formof the rule is only true if we know that the two input schemas havedistinct attributes.

> This is a very loose integration of Cosette / Calcite, but we canmake closer integrations (e.g. within the same JVM, even at runtime) aswe discover synergies. After all, optimization and theorem-proving arerelated endeavors.

Agreed. Cosette is implemented using Coq and Racket. We realize thatthose are not the most popular languages for implementing systems :) ,so Cosette comes with a POST API as well:http://cosette.cs.washington.edu/guide#api . It takes in the programtext written in Cosette, and returns the answer (or times out). Doesthis make it easier to run the tool? We are open to implementing otherbindings as well.


> Another area that would be useful would be to devise test data.

How about this: Each SQL implementation has its own interpretation ofSQL, with Cosette being one of them. Let's implement different SQLsemantics using Cosette (say, Calcite's and Postgres'). Then, given aquery, ask Cosette to find a counterexample (i.e., an input relation)where the two implementations will return different results whenexecuted on a given query. If such a counterexample exists, then Calcitedevelopers can determine whether this is a "bug" or a "feature". Doesthis sound similar to what you have in mind?

> There might be applications in materialized views. A query Q can usea materialized view V if V covers Q. In other words if Q == R(V) where Ris some sequence of relational operators. Given Q and V, Cosette couldperhaps analyze and either return R (success) or return that V does notcover Q (failure).

This resembles the problem of deciding whether a given relation(expressed as a query) is contained in another one. It will take somework for Cosette to be able to handle this but it definitely soundsinteresting. Do you have an application in mind? One of them might be todetermine whether previously cached results can be used.

We definitely see lots of synergies between the two tools. To start withsomething easy :) , I propose we first discuss how to use the currentCosette implementation to audit existing Calcite rules, and a way tointegrate Cosette into development of future Calcite rules as part ofcode review / regression tests. What do you think?


Thanks,
Alvin (on behalf of the Cosette team)

Re: Cosette / Apache Calcite

Reply via email to