One more thing. The geospatial library, including Bertil's proposed PostGIS dialect [5], would also benefit from this effort. We would want to ensure that "ST_Contains(ST_Point(0.0, 0.0), ST_Point(0.0, 0.0))" returns the same value on PostGIS as Calcite. Does testContainers support PostGIS?
Julian [5] https://issues.apache.org/jira/browse/CALCITE-6239 On Mon, Feb 26, 2024 at 11:07 AM Julian Hyde <jh...@apache.org> wrote: > > Thank you, everyone for your thoughts so far. (And let's keep talking!) > > I don't yet see a full, perfect solution. But I see a number of > techniques/tools we can use. For example, I was not aware of > testContainers but it seems to very quickly address our most common > use cases (MySQL + Postgres). Also, while I like Quidem it probably > should not be our first solution to this problem. > > I have two concerns about testContainers: > 1. it makes our test suite heavier: it adds latency, memory > requirements, flakiness, and the requirement for Docker; > 2. it doesn't solve other important dialects, such as BigQuery. > > To solve (1), we could add the tests to an optional module, such as > 'plus'. To solve (2) we need to provide another way to get a > connection to a reference database. (We need that for the other huge > project, the converse of this project, to test whether Calcite's > generated SQL is compliant [1].) > > I think we should start by merging Stamatis' PR into the 'plus' > module, and add some tests for CONCAT, which we have already > implemented, and has different behaviors in MySQL, Postgres, Oracle > [2][3]. Let's enable that test in CI and make sure that it doesn't > introduce flakiness. > > Next, we could make that test invoke tests that are already defined in > SqlOperatorTest, in addition to/instead of, the CSV data in Stamatis' > PR. > > Next, devise a way to test against Spark SQL and BigQuery, and check > our implementation of SOUNDEX (different in Spark than > Postgres/Oracle) and SUBSTR/SUBSTRING (different on Postgres/BigQuery) > [4]. > > After that, Quidem, somehow. > > Julian > > [1] https://issues.apache.org/jira/browse/CALCITE-5529 > [2] https://issues.apache.org/jira/browse/CALCITE-5771 > [3] https://issues.apache.org/jira/browse/CALCITE-5745 > [4] https://issues.apache.org/jira/browse/CALCITE-4427 > > On Mon, Feb 26, 2024 at 10:14 AM Mihai Budiu <mbu...@gmail.com> wrote: > > > > I have already contributed an implementation of Sql Logic Test for Calcite, > > it is part of the "plus" project of Calcite. > > > > SLT is completely complementary with the problem described here. SLT tests > > only the very core of the SQL language, which should be common to all > > implementations. It only uses 3 data types (integer, string, decimal), and > > it never calls any non-standard functions. > > > > If you configure Calcite in a specific way, it can pass all SLT tests. But > > some optimization passes are unsound, and some of these are used in the > > default configuration (when you use Calcite through a JDBC connection). > > That's the reason why currently the SLT tests are not being regularly run. > > > > In our project we took a very similar approach to the one described by > > Julian: when we adopt a SQL function from a dialect (we attempt to support > > functions from many dialects), we test it thoroughly on the original > > database and write all the same tests using our Calcite implementation. If > > something doesn't match, I file an issue with Calcite. > > > > We test > > > > * > > compile-time evaluation (for constant expressions) > > * > > run-time evaluation (including of the same expressions) > > * > > tests that should fail at compilation time (e.g. type errors) > > * > > tests that should fail at runtime (e.g., division by zero) > > > > We try to take the tests from the official test suites, e.g.: Postgres > > regress: > > https://github.com/postgres/postgres/tree/bcd5b4bcbefc8b99be3678cf9c6c990a0c28f68e/src/test/regress/expected > > We have written parsers that read the expected output from the official > > tests so that we can validate the results (for positive tests) while > > reusing as much as possible the original tests without introducing bugs in > > the translation of the tests. > > > > As Yiwen Wu says, in general it is impossible to execute a function for a > > specific dialect on a different runtime and get the same result. But this > > does not mean we shouldn't implement the functions precisely. In our > > runtime we reimplement all the functions in Rust, and we can support > > functions from different runtimes with no problem. But we still need the > > Java implementation to match the Rust implementation and with the original > > database implementation. > > > > Quidem may be the best way to do it, but it needs negative tests too. > > > > Mihai > > > > ________________________________ > > From: Stamatis Zampetakis <zabe...@gmail.com> > > Sent: Monday, February 26, 2024 5:31 AM > > To: dev@calcite.apache.org <dev@calcite.apache.org> > > Subject: Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL > > systems > > > > The Quidem approach definitely makes sense. > > > > Alternatively, since we are focusing on the behavior of functions with > > certain inputs it may be simpler to go with a Java parameterized test > > backed by a CSV/JSON file ([1] outlines the general idea). > > As others mentioned, using the org.testcontainers framework can > > significantly alleviate the burden of setting up and managing > > different DBMS. > > > > Best, > > Stamatis > > > > [1] https://github.com/apache/calcite/pull/3704 > > > > > > On Mon, Feb 26, 2024 at 4:01 AM Yiwen Wu <yiwenwu1...@gmail.com> wrote: > > > > > > I think it is difficult and impossible to make calcite function results > > > completely consistent with other SQL systems. In practice, when > > > calculations are pushed down to different data source adapters for > > > execution, there is no guarantee that the final results will be completely > > > consistent in calcite. > > > > > > I think, for the built-in functions defined in `SqlStdOperatorTable`, we > > > can choose to follow one of the definitive SQL systems, such as Oracle. > > > For > > > the engine-related extension functions in `SqlLibraryOperators`, we can > > > follow the behavior related to the specific engine. If an already defined > > > function behaves inconsistently with the new engine, we can add a new > > > Function to the definition, such as the `CONCAT` function. > > > > > > At the same time, I agree it is a great suggestion to add execution tests > > > of different engines in Quidem, which is very effective in verifying the > > > engine execution behavior. > > > > > > Cancai Cai <caic68...@gmail.com> 于2024年2月26日周一 09:59写道: > > > > > > > Thank you very much to the calcite community for raising these > > > > questions. > > > > This is what I have been doubting. I am very sorry that this doubt has > > > > been > > > > discussed for so long. > > > > > > > > Maybe we also need to consider another issue, that is, the database > > > > version > > > > issue. Versions like mysql and postgres are very stable, but components > > > > like spark still seem to have function bugs. exists, then how should we > > > > consider version issues? > > > > > > > > I don't know what I can do. Recently I am sorting out some documents > > > > about > > > > the use of some functions of mysql and postgres in calcite. I don't > > > > know if > > > > this is helpful. > > > > > > > > Best wishes, > > > > Cancai Cai > > > >