Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

Hanumath Maduri Mon, 26 Feb 2024 12:05:56 -0800

It seems to have support for PostGIS, but I didn't test it though.

https://github.com/testcontainers/testcontainers-java/blob/main/modules/postgresql/src/main/java/org/testcontainers/containers/PostgisContainerProvider.java
https://testcontainers.com/modules/postgis/


Thanks
Hanu

On Mon, Feb 26, 2024 at 11:20 AM Julian Hyde <jh...@apache.org> wrote:

> One more thing. The geospatial library, including Bertil's proposed
> PostGIS dialect [5], would also benefit from this effort. We would
> want to ensure that "ST_Contains(ST_Point(0.0, 0.0), ST_Point(0.0,
> 0.0))" returns the same value on PostGIS as Calcite. Does
> testContainers support PostGIS?
>
> Julian
>
> [5] https://issues.apache.org/jira/browse/CALCITE-6239
>
> On Mon, Feb 26, 2024 at 11:07 AM Julian Hyde <jh...@apache.org> wrote:
> >
> > Thank you, everyone for your thoughts so far. (And let's keep talking!)
> >
> > I don't yet see a full, perfect solution. But I see a number of
> > techniques/tools we can use. For example, I was not aware of
> > testContainers but it seems to very quickly address our most common
> > use cases (MySQL + Postgres). Also, while I like Quidem it probably
> > should not be our first solution to this problem.
> >
> > I have two concerns about testContainers:
> > 1. it makes our test suite heavier: it adds latency, memory
> > requirements, flakiness, and the requirement for Docker;
> > 2. it doesn't solve other important dialects, such as BigQuery.
> >
> > To solve (1), we could add the tests to an optional module, such as
> > 'plus'. To solve (2) we need to provide another way to get a
> > connection to a reference database. (We need that for the other huge
> > project, the converse of this project, to test whether Calcite's
> > generated SQL is compliant [1].)
> >
> > I think we should start by merging Stamatis' PR into the 'plus'
> > module, and add some tests for CONCAT, which we have already
> > implemented, and has different behaviors in MySQL, Postgres, Oracle
> > [2][3]. Let's enable that test in CI and make sure that it doesn't
> > introduce flakiness.
> >
> > Next, we could make that test invoke tests that are already defined in
> > SqlOperatorTest, in addition to/instead of, the CSV data in Stamatis'
> > PR.
> >
> > Next, devise a way to test against Spark SQL and BigQuery, and check
> > our implementation of SOUNDEX (different in Spark than
> > Postgres/Oracle) and SUBSTR/SUBSTRING (different on Postgres/BigQuery)
> > [4].
> >
> > After that, Quidem, somehow.
> >
> > Julian
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-5529
> > [2] https://issues.apache.org/jira/browse/CALCITE-5771
> > [3] https://issues.apache.org/jira/browse/CALCITE-5745
> > [4] https://issues.apache.org/jira/browse/CALCITE-4427
> >
> > On Mon, Feb 26, 2024 at 10:14 AM Mihai Budiu <mbu...@gmail.com> wrote:
> > >
> > > I have already contributed an implementation of Sql Logic Test for
> Calcite, it is part of the "plus" project of Calcite.
> > >
> > > SLT is completely complementary with the problem described here. SLT
> tests only the very core of the SQL language, which should be common to all
> implementations. It only uses 3 data types (integer, string, decimal), and
> it never calls any non-standard functions.
> > >
> > > If you configure Calcite in a specific way, it can pass all SLT tests.
> But some optimization passes are unsound, and some of these are used in the
> default configuration (when you use Calcite through a JDBC connection).
> That's the reason why currently the SLT tests are not being regularly run.
> > >
> > > In our project we took a very similar approach to the one described by
> Julian: when we adopt a SQL function from a dialect (we attempt to support
> functions from many dialects), we test it thoroughly on the original
> database and write all the same tests using our Calcite implementation. If
> something doesn't match, I file an issue with Calcite.
> > >
> > > We test
> > >
> > >   *
> > > compile-time evaluation (for constant expressions)
> > >   *
> > > run-time evaluation (including of the same expressions)
> > >   *
> > > tests that should fail at compilation time (e.g. type errors)
> > >   *
> > > tests that should fail at runtime (e.g., division by zero)
> > >
> > > We try to take the tests from the official test suites, e.g.: Postgres
> regress:
> https://github.com/postgres/postgres/tree/bcd5b4bcbefc8b99be3678cf9c6c990a0c28f68e/src/test/regress/expected
> > > We have written parsers that read the expected output from the
> official tests so that we can validate the results (for positive tests)
> while reusing as much as possible the original tests without introducing
> bugs in the translation of the tests.
> > >
> > > As Yiwen Wu says, in general it is impossible to execute a function
> for a specific dialect on a different runtime and get the same result. But
> this does not mean we shouldn't implement the functions precisely. In our
> runtime we reimplement all the functions in Rust, and we can support
> functions from different runtimes with no problem. But we still need the
> Java implementation to match the Rust implementation and with the original
> database implementation.
> > >
> > > Quidem may be the best way to do it, but it needs negative tests too.
> > >
> > > Mihai
> > >
> > > ________________________________
> > > From: Stamatis Zampetakis <zabe...@gmail.com>
> > > Sent: Monday, February 26, 2024 5:31 AM
> > > To: dev@calcite.apache.org <dev@calcite.apache.org>
> > > Subject: Re: [DISCUSS] Ensuring that Calcite is consistent with other
> SQL systems
> > >
> > > The Quidem approach definitely makes sense.
> > >
> > > Alternatively, since we are focusing on the behavior of functions with
> > > certain inputs it may be simpler to go with a Java parameterized test
> > > backed by a CSV/JSON file ([1] outlines the general idea).
> > > As others mentioned, using the org.testcontainers framework can
> > > significantly alleviate the burden of setting up and managing
> > > different DBMS.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1] https://github.com/apache/calcite/pull/3704
> > >
> > >
> > > On Mon, Feb 26, 2024 at 4:01 AM Yiwen Wu <yiwenwu1...@gmail.com>
> wrote:
> > > >
> > > > I think it is difficult and impossible to make calcite function
> results
> > > > completely consistent with other SQL systems. In practice, when
> > > > calculations are pushed down to different data source adapters for
> > > > execution, there is no guarantee that the final results will be
> completely
> > > > consistent in calcite.
> > > >
> > > > I think, for the built-in functions defined in
> `SqlStdOperatorTable`, we
> > > > can choose to follow one of the definitive SQL systems, such as
> Oracle. For
> > > > the engine-related extension functions in `SqlLibraryOperators`, we
> can
> > > > follow the behavior related to the specific engine. If an already
> defined
> > > > function behaves inconsistently with the new engine, we can add a new
> > > > Function to the definition, such as the `CONCAT` function.
> > > >
> > > > At the same time, I agree it is a great suggestion to add execution
> tests
> > > > of different engines in Quidem, which is very effective in verifying
> the
> > > > engine execution behavior.
> > > >
> > > > Cancai Cai <caic68...@gmail.com> 于2024年2月26日周一 09:59写道：
> > > >
> > > > > Thank you very much to the calcite community for raising these
> questions.
> > > > > This is what I have been doubting. I am very sorry that this doubt
> has been
> > > > > discussed for so long.
> > > > >
> > > > > Maybe we also need to consider another issue, that is, the
> database version
> > > > > issue. Versions like mysql and postgres are very stable, but
> components
> > > > > like spark still seem to have function bugs. exists, then how
> should we
> > > > > consider version issues?
> > > > >
> > > > > I don't know what I can do. Recently I am sorting out some
> documents about
> > > > > the use of some functions of mysql and postgres in calcite. I
> don't know if
> > > > > this is helpful.
> > > > >
> > > > > Best wishes,
> > > > > Cancai Cai
> > > > >
>

Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

Reply via email to