Re: Easier and more comprehensive testing

Stamatis Zampetakis Thu, 03 Aug 2023 02:42:53 -0700

Hey Mihai,

Indeed the .iq files are mostly end-to-end tests in Calcite. The
Quidem [1] framework is used for running these tests. The Quidem
approach is pretty neat cause we don't have to carry Java boilerplate
code and formatting around which makes the tests much easier to read,
write, inspect history, and copy-paste across projects. Separating
SQL/DDL/DML files in separate non-java files is used in other projects
as well. The most relevant example that comes to mind is Apache Hive
that has ~5K .q files (see cast4.q [2] for an example) and its own
test framework for running those.


End-to-end tests are great but so are unit tests. One does not replace
the other and both are needed in a project. End-to-end tests are much
slower than unit tests so if we have too many of them we will not be
able to run all of them during the build. Again Hive is nice example
on this front cause due to the big number of end-to-end tests
developers cannot run everything locally (it requires more than 24
hours). As a result, lots of PRs are created by running just a few
tests locally and then waiting for the CI to finish. If test failures
are detected, new commits must be pushed, and the whole process is
repeated; the whole cycle takes several days. Unit tests are there to
catch as many problems as possible early on and save us from this
lengthy back and forth.

The end-to-end tests in Calcite may be few compared to other projects
but we should take into account that Calcite is not a complete DBMS.
DBMS that are built using Calcite tend to have many more end-to-end
tests and they are essentially testing Calcite as well.

To conclude, I am strongly in favor of adding new tests especially if
they help to uncover new bugs and problems in the project. For
end-to-end tests, I would probably invest on Quidem (and .iq files) as
far as it concerns Calcite and I would always push for unit tests when
applicable.

Best,
Stamatis

[1] https://github.com/julianhyde/quidem
[2] 
https://github.com/apache/hive/blob/0447b88d8308be669eb9102637a65736ed2bfccf/ql/src/test/queries/clientpositive/cast4.q

On Sat, Jul 29, 2023 at 5:04 AM LakeShen <shenleifight...@gmail.com> wrote:
>
> Hi mbudiu,thank you for bringing SqlLogicTest to calcite.
> Now we're working on an optimizer based on Calcite that supports Postgresql
> semantics as a whole. I am very interested in your bringing the Postgres
> test to Calcite and would like to participate in it. Could you please give
> me a complete collection of reference materials?
> The current document information is scattered, some in sql-logic-test
> github and some in sql-to-dbsp-compiler github, which requires everyone to
> read on github by themselves. If you could give me a complete end-to-end
> demo of adding tests, I would really appreciate it. Best, LakeShen
>
> <mbu...@gmail.com> 于2023年7月29日周六 01:04写道：
>
> > The correct link is
> > https://github.com/feldera/dbsp/tree/main/sql-to-dbsp-compiler, but it's
> > not particularly important, I apologize for the broken one.
> >
> > I appreciate you pointing out the Sql Logic Test project, I wrote that
> > code too. But these two projects are almost entirely disjoint. SLT brings a
> > few million pre-written tests. But none of the SLT tests exercises any of
> > the SQL functions. So you need to write many more tests for each *function*
> > implemented, and most contributions to Calcite lately are functions in
> > different dialects.
> >
> > Moreover, while SLT found quite a few bugs, I only had time to file one of
> > them so far. For each failing test I have to figure out (1) whether it's a
> > duplicate, (2) to create a minimal reproduction, (3) to figure out which
> > part of the compiler is faulty in order to understand where to insert the
> > test, (4) create a reproduction tailored for the faulty  component.
> > Reproductions for planner bugs are different than reproductions for library
> > bugs. But some bugs happen only if you combine some planner rules with some
> > functions.
> >
> > With the proposal below you only need a CLI to a database to generate new
> > tests (e.g., BigQuery): you execute write the queries in the CLI, then
> > copy-paste the output into a test. This makes it easier to file bugs, but
> > harder to diagnose them. But it also makes it much easier to write lots of
> > tests, because it enables people without Calcite expertise to write the
> > tests.
> >
> > Mihai
> >
> > -----Original Message-----
> > From: stanilovsky evgeny
> > Sent: Friday, July 28, 2023 2:43 AM
> > To: dev@calcite.apache.org
> > Subject: Re: Easier and more comprehensive testing
> >
> > Hello, your github link doesn`t open.
> > plz check discussion here in dev list titled:
> >
> > Running Sql Logic Tests for Calcite
> > This is the JIRA case: https://issues.apache.org/jira/browse/CALCITE-5615
> > And this is the PR: https://github.com/apache/calcite/pull/3145
> >
> >
> > > Hello,
> > >
> > >
> > > I am working to test our calcite-based compiler
> > > (https://github.com/feldera/dbsp/sql-to-dbsp-compiler), and since I am
> > > lazy I am borrowing tests from other open-source test suites, like
> > > Postgres.
> > >
> > > I am finding bugs in Calcite with a relatively high frequency, as you
> > > may have noticed if you follow the JIRA. I would say that one in 3 SQL
> > > functions I test turns out to have some problems. So our techniques
> > > seem to be effective at finding bugs.
> > >
> > > I think that some of the techniques we are using could be applied to
> > > Calcite as well:
> > >
> > >
> > > *     First, we run all tests with and without the optimizer. In
> > > particular, we go through all the "constant evaluation" rules of Calcite.
> > > For constant expressions the results should be identical with and
> > > without optimizer. They aren't always, and we have found quite a few
> > > cases where the compile-time evaluation crashes or produces wrong
> > > results. So the compile-time and the run-time evaluator cross-check
> > > each other.
> > > *     Second, we have tried to make it very easy to write end-to-end
> > > tests, at least positive tests (which are supposed to return a value
> > > rather than an error). Here is a stylized excerpt from our testing
> > > code:
> > >
> > >
> > > @Override
> > >
> > > public void prepareData() {
> > >
> > >         this.executeStatements("CREATE TABLE FLOAT4_TBL (f1
> > float4);\n"
> > > +
> > >
> > >                 "INSERT INTO FLOAT4_TBL(f1) VALUES ('    0.0');\n" +
> > >
> > >                 "INSERT INTO FLOAT4_TBL(f1) VALUES ('1004.30   ');\n" +
> > >
> > >                 "INSERT INTO FLOAT4_TBL(f1) VALUES ('     -34.84
> > > ');\n" +
> > >
> > >                 "INSERT INTO FLOAT4_TBL(f1) VALUES
> > > ('1.2345678901234e+20');\n" +
> > >
> > >                 "INSERT INTO FLOAT4_TBL(f1) VALUES
> > > ('1.2345678901234e-20');");
> > >
> > > }
> > >
> > >
> > > @Test
> > >
> > > public void testFPArithmetic() {
> > >
> > >         this.qs("SELECT f.f1, f.f1 * '-10' AS x FROM FLOAT4_TBL f\n" +
> > >
> > >                 "   WHERE f.f1 > '0.0';\n" +
> > >
> > >                 "      f1       |       x        \n" +
> > >
> > >                 "---------------+----------------\n" +
> > >
> > >                 "        1004.3 |         -10043\n" +
> > >
> > >                 " 1.2345679e+20 | -1.2345678e+21\n" +
> > >
> > >                 " 1.2345679e-20 | -1.2345678e-19\n" +
> > >
> > >                 "(3 rows)\n" +
> > >
> > >                 "\n" +
> > >
> > >                 "SELECT f.f1, f.f1 + '-10' AS x FROM FLOAT4_TBL f\n" +
> > >
> > >                 "   WHERE f.f1 > '0.0';\n" +
> > >
> > >                 "      f1       |       x       \n" +
> > >
> > >                 "---------------+---------------\n" +
> > >
> > >                 "        1004.3 |         994.3\n" +
> > >
> > >                 " 1.2345679e+20 | 1.2345679e+20\n" +
> > >
> > >                 " 1.2345679e-20 |           -10\n" +
> > >
> > >                 "(3 rows)\n" +
> > >
> > >                 "\n");
> > >
> > >        }
> > >
> > >
> > > These long strings are an almost direct copy-and-paste from tests in
> > > https://github.com/postgres/postgres/blob/03734a7fed7d924679770adb78a7
> > > db8a37 d14188/src/test/regress/expected/float4.out (but they have to
> > > manually checked, since Calcite does often not match Postgres
> > > behaviors). A string can contain multiple query-expected result
> > > strings.
> > >
> > >
> > > The point is that the barrier for writing the tests is quite low.
> > >
> > > Granted, this approach has some weaknesses as well, in particular, we
> > > rely on a specific output format and is brittle in some respects
> > > (e.g., spaces in output strings).
> > >
> > >
> > > I have seen some *.iq files as resources in the Calcite source code,
> > > are these used in the same way for testing in Calcite?  If yes, that's
> > > great, and there should be many more.
> > >
> > >
> > > Calcite has lots of unit tests, but I find that our "end-to-end" tests
> > > are easier to write and have much better code coverage. For example,
> > > for each failure I find in Calcite I have to dig pretty hard to figure
> > > out which of the testing files should contain my reproduction (e.g.,
> > > SqlOperatorTest, RelOptRulesTest, RelToSqlConverterTest, etc.), and
> > > how to exactly write the reproduction (some reproductions require
> > > editing some huge XML files too).
> > > That takes a lot of time.
> > >
> > >
> > > I am not necessarily signing up to build this infrastructure, at least
> > > not right away. I am not sure how much of the stuff I wrote for our
> > > compiler could be ported directly to Calcite. But I will think about
> > > it, and I will gladly help this effort.
> > >
> > >
> > > I appreciate any comments and suggestions,
> > >
> > > Mihai
> >
> >

Re: Easier and more comprehensive testing

Reply via email to