Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

Julian Hyde Mon, 26 Feb 2024 11:20:25 -0800

One more thing. The geospatial library, including Bertil's proposed
PostGIS dialect [5], would also benefit from this effort. We would
want to ensure that "ST_Contains(ST_Point(0.0, 0.0), ST_Point(0.0,
0.0))" returns the same value on PostGIS as Calcite. Does
testContainers support PostGIS?


Julian

[5] https://issues.apache.org/jira/browse/CALCITE-6239

On Mon, Feb 26, 2024 at 11:07 AM Julian Hyde <jh...@apache.org> wrote:
>
> Thank you, everyone for your thoughts so far. (And let's keep talking!)
>
> I don't yet see a full, perfect solution. But I see a number of
> techniques/tools we can use. For example, I was not aware of
> testContainers but it seems to very quickly address our most common
> use cases (MySQL + Postgres). Also, while I like Quidem it probably
> should not be our first solution to this problem.
>
> I have two concerns about testContainers:
> 1. it makes our test suite heavier: it adds latency, memory
> requirements, flakiness, and the requirement for Docker;
> 2. it doesn't solve other important dialects, such as BigQuery.
>
> To solve (1), we could add the tests to an optional module, such as
> 'plus'. To solve (2) we need to provide another way to get a
> connection to a reference database. (We need that for the other huge
> project, the converse of this project, to test whether Calcite's
> generated SQL is compliant [1].)
>
> I think we should start by merging Stamatis' PR into the 'plus'
> module, and add some tests for CONCAT, which we have already
> implemented, and has different behaviors in MySQL, Postgres, Oracle
> [2][3]. Let's enable that test in CI and make sure that it doesn't
> introduce flakiness.
>
> Next, we could make that test invoke tests that are already defined in
> SqlOperatorTest, in addition to/instead of, the CSV data in Stamatis'
> PR.
>
> Next, devise a way to test against Spark SQL and BigQuery, and check
> our implementation of SOUNDEX (different in Spark than
> Postgres/Oracle) and SUBSTR/SUBSTRING (different on Postgres/BigQuery)
> [4].
>
> After that, Quidem, somehow.
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-5529
> [2] https://issues.apache.org/jira/browse/CALCITE-5771
> [3] https://issues.apache.org/jira/browse/CALCITE-5745
> [4] https://issues.apache.org/jira/browse/CALCITE-4427
>
> On Mon, Feb 26, 2024 at 10:14 AM Mihai Budiu <mbu...@gmail.com> wrote:
> >
> > I have already contributed an implementation of Sql Logic Test for Calcite, 
> > it is part of the "plus" project of Calcite.
> >
> > SLT is completely complementary with the problem described here. SLT tests 
> > only the very core of the SQL language, which should be common to all 
> > implementations. It only uses 3 data types (integer, string, decimal), and 
> > it never calls any non-standard functions.
> >
> > If you configure Calcite in a specific way, it can pass all SLT tests. But 
> > some optimization passes are unsound, and some of these are used in the 
> > default configuration (when you use Calcite through a JDBC connection). 
> > That's the reason why currently the SLT tests are not being regularly run.
> >
> > In our project we took a very similar approach to the one described by 
> > Julian: when we adopt a SQL function from a dialect (we attempt to support 
> > functions from many dialects), we test it thoroughly on the original 
> > database and write all the same tests using our Calcite implementation. If 
> > something doesn't match, I file an issue with Calcite.
> >
> > We test
> >
> >   *
> > compile-time evaluation (for constant expressions)
> >   *
> > run-time evaluation (including of the same expressions)
> >   *
> > tests that should fail at compilation time (e.g. type errors)
> >   *
> > tests that should fail at runtime (e.g., division by zero)
> >
> > We try to take the tests from the official test suites, e.g.: Postgres 
> > regress: 
> > https://github.com/postgres/postgres/tree/bcd5b4bcbefc8b99be3678cf9c6c990a0c28f68e/src/test/regress/expected
> > We have written parsers that read the expected output from the official 
> > tests so that we can validate the results (for positive tests) while 
> > reusing as much as possible the original tests without introducing bugs in 
> > the translation of the tests.
> >
> > As Yiwen Wu says, in general it is impossible to execute a function for a 
> > specific dialect on a different runtime and get the same result. But this 
> > does not mean we shouldn't implement the functions precisely. In our 
> > runtime we reimplement all the functions in Rust, and we can support 
> > functions from different runtimes with no problem. But we still need the 
> > Java implementation to match the Rust implementation and with the original 
> > database implementation.
> >
> > Quidem may be the best way to do it, but it needs negative tests too.
> >
> > Mihai
> >
> > ________________________________
> > From: Stamatis Zampetakis <zabe...@gmail.com>
> > Sent: Monday, February 26, 2024 5:31 AM
> > To: dev@calcite.apache.org <dev@calcite.apache.org>
> > Subject: Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL 
> > systems
> >
> > The Quidem approach definitely makes sense.
> >
> > Alternatively, since we are focusing on the behavior of functions with
> > certain inputs it may be simpler to go with a Java parameterized test
> > backed by a CSV/JSON file ([1] outlines the general idea).
> > As others mentioned, using the org.testcontainers framework can
> > significantly alleviate the burden of setting up and managing
> > different DBMS.
> >
> > Best,
> > Stamatis
> >
> > [1] https://github.com/apache/calcite/pull/3704
> >
> >
> > On Mon, Feb 26, 2024 at 4:01 AM Yiwen Wu <yiwenwu1...@gmail.com> wrote:
> > >
> > > I think it is difficult and impossible to make calcite function results
> > > completely consistent with other SQL systems. In practice, when
> > > calculations are pushed down to different data source adapters for
> > > execution, there is no guarantee that the final results will be completely
> > > consistent in calcite.
> > >
> > > I think, for the built-in functions defined in `SqlStdOperatorTable`, we
> > > can choose to follow one of the definitive SQL systems, such as Oracle. 
> > > For
> > > the engine-related extension functions in `SqlLibraryOperators`, we can
> > > follow the behavior related to the specific engine. If an already defined
> > > function behaves inconsistently with the new engine, we can add a new
> > > Function to the definition, such as the `CONCAT` function.
> > >
> > > At the same time, I agree it is a great suggestion to add execution tests
> > > of different engines in Quidem, which is very effective in verifying the
> > > engine execution behavior.
> > >
> > > Cancai Cai <caic68...@gmail.com> 于2024年2月26日周一 09:59写道：
> > >
> > > > Thank you very much to the calcite community for raising these 
> > > > questions.
> > > > This is what I have been doubting. I am very sorry that this doubt has 
> > > > been
> > > > discussed for so long.
> > > >
> > > > Maybe we also need to consider another issue, that is, the database 
> > > > version
> > > > issue. Versions like mysql and postgres are very stable, but components
> > > > like spark still seem to have function bugs. exists, then how should we
> > > > consider version issues?
> > > >
> > > > I don't know what I can do. Recently I am sorting out some documents 
> > > > about
> > > > the use of some functions of mysql and postgres in calcite. I don't 
> > > > know if
> > > > this is helpful.
> > > >
> > > > Best wishes,
> > > > Cancai Cai
> > > >

Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

Reply via email to