A couple of weeks ago Vladimir pointed out a new research project from UW that checks whether two SQL statements are equivalent. It is called Cosette[1] and there is a nice blog post in Medium about it[2]. I have conversed with the researchers (Alvin and Shumo) back-channel, but let’s have a discussion whether there are any synergies between Apache Calcite and Cosette.
The researchers are on Bcc. I suggest that they join dev@calcite to reply and see other messages. One obvious idea is to use Cosette to audit Calcite’s query transformation rules. Each rule is supposed to preserve semantics but (until Cosette) we had to trust the author of the rule. We could convert the before and after relational expressions to SQL, and then ask Cosette whether those are equivalent. We could enable this check in Calcite’s test suite, during which many thousands of rules are fired. I have logged https://issues.apache.org/jira/browse/CALCITE-1977 <https://issues.apache.org/jira/browse/CALCITE-1977> for this. A few rules might use other information besides the input relational expression, such as predicates that are known to hold or column combinations that are known to be unique. But let’s see what happens. This is a very loose integration of Cosette / Calcite, but we can make closer integrations (e.g. within the same JVM, even at runtime) as we discover synergies. After all, optimization and theorem-proving are related endeavors. Another area that would be useful would be to devise test data. Let’s suppose that I am testing Calcite’s implementation of select * from emp where sal = 100 or sal not in (select age + 100 from emp where deptno > 20) against a reference implementation (say Postgres) that I know gives the correct answer. Even if Calcite has a bug, it may not show up in the default “emp” data set. There are a lot of combinations of sal null / not-null, deptno null / not-null, and emp empty non / non-empty, and so forth. I presume that Cosette is not a reference implementation of SQL, but could Cosette perhaps help suggest data sets that would highlight incompatibilities? There might be applications in materialized views. A query Q can use a materialized view V if V covers Q. In other words if Q == R(V) where R is some sequence of relational operators. Given Q and V, Cosette could perhaps analyze and either return R (success) or return that V does not cover Q (failure). Any other ideas? Julian [1] http://cosette.cs.washington.edu/ <http://cosette.cs.washington.edu/> [2] https://medium.com/@uwdb/introducing-cosette-527898504bd6 <https://medium.com/@uwdb/introducing-cosette-527898504bd6>
