Hello,
I am working to test our calcite-based compiler (https://github.com/feldera/dbsp/sql-to-dbsp-compiler), and since I am lazy I am borrowing tests from other open-source test suites, like Postgres. I am finding bugs in Calcite with a relatively high frequency, as you may have noticed if you follow the JIRA. I would say that one in 3 SQL functions I test turns out to have some problems. So our techniques seem to be effective at finding bugs. I think that some of the techniques we are using could be applied to Calcite as well: * First, we run all tests with and without the optimizer. In particular, we go through all the "constant evaluation" rules of Calcite. For constant expressions the results should be identical with and without optimizer. They aren't always, and we have found quite a few cases where the compile-time evaluation crashes or produces wrong results. So the compile-time and the run-time evaluator cross-check each other. * Second, we have tried to make it very easy to write end-to-end tests, at least positive tests (which are supposed to return a value rather than an error). Here is a stylized excerpt from our testing code: @Override public void prepareData() { this.executeStatements("CREATE TABLE FLOAT4_TBL (f1 float4);\n" + "INSERT INTO FLOAT4_TBL(f1) VALUES (' 0.0');\n" + "INSERT INTO FLOAT4_TBL(f1) VALUES ('1004.30 ');\n" + "INSERT INTO FLOAT4_TBL(f1) VALUES (' -34.84 ');\n" + "INSERT INTO FLOAT4_TBL(f1) VALUES ('1.2345678901234e+20');\n" + "INSERT INTO FLOAT4_TBL(f1) VALUES ('1.2345678901234e-20');"); } @Test public void testFPArithmetic() { this.qs("SELECT f.f1, f.f1 * '-10' AS x FROM FLOAT4_TBL f\n" + " WHERE f.f1 > '0.0';\n" + " f1 | x \n" + "---------------+----------------\n" + " 1004.3 | -10043\n" + " 1.2345679e+20 | -1.2345678e+21\n" + " 1.2345679e-20 | -1.2345678e-19\n" + "(3 rows)\n" + "\n" + "SELECT f.f1, f.f1 + '-10' AS x FROM FLOAT4_TBL f\n" + " WHERE f.f1 > '0.0';\n" + " f1 | x \n" + "---------------+---------------\n" + " 1004.3 | 994.3\n" + " 1.2345679e+20 | 1.2345679e+20\n" + " 1.2345679e-20 | -10\n" + "(3 rows)\n" + "\n"); } These long strings are an almost direct copy-and-paste from tests in https://github.com/postgres/postgres/blob/03734a7fed7d924679770adb78a7db8a37 d14188/src/test/regress/expected/float4.out (but they have to manually checked, since Calcite does often not match Postgres behaviors). A string can contain multiple query-expected result strings. The point is that the barrier for writing the tests is quite low. Granted, this approach has some weaknesses as well, in particular, we rely on a specific output format and is brittle in some respects (e.g., spaces in output strings). I have seen some *.iq files as resources in the Calcite source code, are these used in the same way for testing in Calcite? If yes, that's great, and there should be many more. Calcite has lots of unit tests, but I find that our "end-to-end" tests are easier to write and have much better code coverage. For example, for each failure I find in Calcite I have to dig pretty hard to figure out which of the testing files should contain my reproduction (e.g., SqlOperatorTest, RelOptRulesTest, RelToSqlConverterTest, etc.), and how to exactly write the reproduction (some reproductions require editing some huge XML files too). That takes a lot of time. I am not necessarily signing up to build this infrastructure, at least not right away. I am not sure how much of the stuff I wrote for our compiler could be ported directly to Calcite. But I will think about it, and I will gladly help this effort. I appreciate any comments and suggestions, Mihai
