The correct link is 
https://github.com/feldera/dbsp/tree/main/sql-to-dbsp-compiler, but it's not 
particularly important, I apologize for the broken one.

I appreciate you pointing out the Sql Logic Test project, I wrote that code 
too. But these two projects are almost entirely disjoint. SLT brings a few 
million pre-written tests. But none of the SLT tests exercises any of the SQL 
functions. So you need to write many more tests for each *function* 
implemented, and most contributions to Calcite lately are functions in 
different dialects.

Moreover, while SLT found quite a few bugs, I only had time to file one of them 
so far. For each failing test I have to figure out (1) whether it's a 
duplicate, (2) to create a minimal reproduction, (3) to figure out which part 
of the compiler is faulty in order to understand where to insert the test, (4) 
create a reproduction tailored for the faulty  component. Reproductions for 
planner bugs are different than reproductions for library bugs. But some bugs 
happen only if you combine some planner rules with some functions.

With the proposal below you only need a CLI to a database to generate new tests 
(e.g., BigQuery): you execute write the queries in the CLI, then copy-paste the 
output into a test. This makes it easier to file bugs, but harder to diagnose 
them. But it also makes it much easier to write lots of tests, because it 
enables people without Calcite expertise to write the tests.

Mihai

-----Original Message-----
From: stanilovsky evgeny 
Sent: Friday, July 28, 2023 2:43 AM
To: [email protected]
Subject: Re: Easier and more comprehensive testing 

Hello, your github link doesn`t open.
plz check discussion here in dev list titled:

Running Sql Logic Tests for Calcite
This is the JIRA case: https://issues.apache.org/jira/browse/CALCITE-5615
And this is the PR: https://github.com/apache/calcite/pull/3145


> Hello,
>
>
> I am working to test our calcite-based compiler 
> (https://github.com/feldera/dbsp/sql-to-dbsp-compiler), and since I am 
> lazy I am borrowing tests from other open-source test suites, like 
> Postgres.
>
> I am finding bugs in Calcite with a relatively high frequency, as you 
> may have noticed if you follow the JIRA. I would say that one in 3 SQL 
> functions I test turns out to have some problems. So our techniques 
> seem to be effective at finding bugs.
>
> I think that some of the techniques we are using could be applied to 
> Calcite as well:
>
>
> *     First, we run all tests with and without the optimizer. In
> particular, we go through all the "constant evaluation" rules of Calcite.
> For constant expressions the results should be identical with and 
> without optimizer. They aren't always, and we have found quite a few 
> cases where the compile-time evaluation crashes or produces wrong 
> results. So the compile-time and the run-time evaluator cross-check 
> each other.
> *     Second, we have tried to make it very easy to write end-to-end
> tests, at least positive tests (which are supposed to return a value 
> rather than an error). Here is a stylized excerpt from our testing 
> code:
>
>
> @Override
>
> public void prepareData() {
>
>         this.executeStatements("CREATE TABLE FLOAT4_TBL (f1  float4);\n"  
> +
>
>                 "INSERT INTO FLOAT4_TBL(f1) VALUES ('    0.0');\n" +
>
>                 "INSERT INTO FLOAT4_TBL(f1) VALUES ('1004.30   ');\n" +
>
>                 "INSERT INTO FLOAT4_TBL(f1) VALUES ('     -34.84     
> ');\n" +
>
>                 "INSERT INTO FLOAT4_TBL(f1) VALUES 
> ('1.2345678901234e+20');\n" +
>
>                 "INSERT INTO FLOAT4_TBL(f1) VALUES 
> ('1.2345678901234e-20');");
>
> }
>
>
> @Test
>
> public void testFPArithmetic() {
>
>         this.qs("SELECT f.f1, f.f1 * '-10' AS x FROM FLOAT4_TBL f\n" +
>
>                 "   WHERE f.f1 > '0.0';\n" +
>
>                 "      f1       |       x        \n" +
>
>                 "---------------+----------------\n" +
>
>                 "        1004.3 |         -10043\n" +
>
>                 " 1.2345679e+20 | -1.2345678e+21\n" +
>
>                 " 1.2345679e-20 | -1.2345678e-19\n" +
>
>                 "(3 rows)\n" +
>
>                 "\n" +
>
>                 "SELECT f.f1, f.f1 + '-10' AS x FROM FLOAT4_TBL f\n" +
>
>                 "   WHERE f.f1 > '0.0';\n" +
>
>                 "      f1       |       x       \n" +
>
>                 "---------------+---------------\n" +
>
>                 "        1004.3 |         994.3\n" +
>
>                 " 1.2345679e+20 | 1.2345679e+20\n" +
>
>                 " 1.2345679e-20 |           -10\n" +
>
>                 "(3 rows)\n" +
>
>                 "\n");
>
>        }
>
>
> These long strings are an almost direct copy-and-paste from tests in
> https://github.com/postgres/postgres/blob/03734a7fed7d924679770adb78a7
> db8a37 d14188/src/test/regress/expected/float4.out (but they have to 
> manually checked, since Calcite does often not match Postgres 
> behaviors). A string can contain multiple query-expected result 
> strings.
>
>
> The point is that the barrier for writing the tests is quite low.
>
> Granted, this approach has some weaknesses as well, in particular, we 
> rely on a specific output format and is brittle in some respects 
> (e.g., spaces in output strings).
>
>
> I have seen some *.iq files as resources in the Calcite source code, 
> are these used in the same way for testing in Calcite?  If yes, that's 
> great, and there should be many more.
>
>
> Calcite has lots of unit tests, but I find that our "end-to-end" tests 
> are easier to write and have much better code coverage. For example, 
> for each failure I find in Calcite I have to dig pretty hard to figure 
> out which of the testing files should contain my reproduction (e.g., 
> SqlOperatorTest, RelOptRulesTest, RelToSqlConverterTest, etc.), and 
> how to exactly write the reproduction (some reproductions require 
> editing some huge XML files too).
> That takes a lot of time.
>
>
> I am not necessarily signing up to build this infrastructure, at least 
> not right away. I am not sure how much of the stuff I wrote for our 
> compiler could be ported directly to Calcite. But I will think about 
> it, and I will gladly help this effort.
>
>
> I appreciate any comments and suggestions,
>
> Mihai

Reply via email to