Re: [DISCUSS] Calcite as SQL translator, and dialect testing

Mihai Budiu Mon, 02 Dec 2024 10:06:08 -0800

My claim was that it's impossible to even implement a translation between any 
two dialects such that the resulting program provides the exact same results as 
the source program.


Clearly, I don't know all SQL dialects, but I think this is true for all major 
dialects.

Mihai

________________________________
From: Yanjing Wang <zhuangzixiao...@gmail.com>
Sent: Sunday, December 1, 2024 7:10 PM
To: dev@calcite.apache.org <dev@calcite.apache.org>
Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing

Thank you, Mihai, for your detailed insights.

Implementing translations for all dialects indeed requires a significant
amount of work and can be quite challenging to complete. Would it be
possible for us to create an example that demonstrates how to implement
translation between two specific dialects and materialized view
substitution from scratch? This example could potentially be enhanced with
contributions from various companies and be friendly to newbies.


Thank you, Walaa, for your recommendation. I understand that Coral uses
ANTLR4 for parsing and translating from ASTNode to Calcite SqlNode. Do you
find this approach easier compared to defining a Calcite FTL file for
unsupported syntax?

Walaa Eldin Moustafa <wa.moust...@gmail.com> 于2024年11月28日周四 01:27写道：

> You may consider looking into Coral [1] which is based on Calcite, but
> extends it with the translation requirements in mind, and as first class
> problems.
>
> [1] https://github.com/linkedin/coral
>
> Thanks,
> Walaa.
>
>
> On Wed, Nov 27, 2024 at 9:20 AM Mihai Budiu <mbu...@gmail.com> wrote:
>
> > I think that the task of translation between SQL dialects is hopeless.
> > Consider just the type DECIMAL:
> >
> >
> >   *
> > Each dialect supports a different range of DECIMAL types, specified by
> > scale and precision.
> >   *
> > Some dialects support negative scales, and some support scales >
> precision.
> >   *
> > Some dialects, like Postgres have infinite values in DECIMAL. This means
> > that DECIMAL arithmetic can never cause a runtime error in Postgres,
> unlike
> > all other dialects.
> >   *
> > The rounding rules for converting between DECIMAL and other types differ
> > between dialects.
> >   *
> > The typing rules for DECIMAL arithmetic differ between dialects: what is
> > the type of a/b when a and b are DECIMALs?
> >   *
> > The implicit typing of literals is different between dialects: is 1.0e0 a
> > DECIMAL or some floating point value?
> >
> > The conclusion is that a SQL dialect can express many programs that
> cannot
> > be expressed at all in other SQL dialects. What is a translator supposed
> to
> > do with such programs?
> >
> > These are a set of problems for just one type, but when you consider the
> > full set of types supported, the problem is exponentially larger.
> >
> > Just look at the differences between functions in dialects:
> > https://calcite.apache.org/docs/reference.html Calcite has 4 versions of
> > CONCAT_WS, with subtle semantic differences.
> >
> > Are you aware of any tool, anywhere that translates between 2 fixed SQL
> > dialects while preserving the program's exact behavior in all cases?
> > Building a tool for all possible dialects is again exponentially harder.
> >
> > Since we have established that exact semantics-preserving transformation
> > of SQL programs between dialects is impossible, you need to set up a goal
> > for the translation effort. What do you really hope to achieve? Then we
> can
> > discuss about how it can be done.
> >
> > Mihai
> >
> > ________________________________
> > From: Yanjing Wang <zhuangzixiao...@gmail.com>
> > Sent: Wednesday, November 27, 2024 1:02 AM
> > To: dev@calcite.apache.org <dev@calcite.apache.org>
> > Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing
> >
> > Hello, this discussion has been ongoing for a week. Let's move it
> forward.
> > Does anyone else have any suggestions?
> >
> > Yanjing Wang <zhuangzixiao...@gmail.com> 于2024年11月19日周二 14:47写道：
> >
> > > 1. RelToSqlConverterTest
> > > the class name implies tests conversion from RelNode to SQL, but now
> its
> > > RelNode comes from different dialects with target sql. it is difficult
> > for
> > > me to understand the test case
> > >
> > > @Test void testNullCollation() {
> > >   final String query = "select * from \"product\" order by
> > \"brand_name\"";
> > >   final String expected = "SELECT *\n"
> > >       + "FROM \"foodmart\".\"product\"\n"
> > >       + "ORDER BY \"brand_name\"";
> > >   final String sparkExpected = "SELECT *\n"
> > >       + "FROM `foodmart`.`product`\n"
> > >       + "ORDER BY `brand_name` NULLS LAST";
> > >   sql(query)
> > >       .withPresto().ok(expected)
> > >       .withSpark().ok(sparkExpected);
> > > }
> > >
> > >
> > > Why does the spark sql have 'NULLS LAST' in the end? the information is
> > > missing if we don't add source rel or source dialect.
> > >
> > > 2. Dialect-to-dialect translation
> > > I think it's necessary, dialect translation and materialized view
> > > substitution are common in big data domain, it would be beneficial to
> > make
> > > Calcite more user-friendly for these scenarios.
> > > Could we create end-to-end test cases that start with the source SQL of
> > > one dialect and end with the target SQL of another (or the same)
> dialect?
> > > We could also include user-defined materialized views in the process
> and
> > > perform result comparison.
> > >
> > > Julian Hyde <jhyde.apa...@gmail.com> 于2024年11月19日周二 07:21写道：
> > >
> > >> A recent case, https://issues.apache.org/jira/browse/CALCITE-6693,
> "Add
> > >> Source SQL Dialect to RelToSqlConverterTest”, implies that people are
> > using
> > >> Calcite to translate SQL from dialect to another. The test wanted to
> > test
> > >> translating a SQL string from Presto to Redshift. I pushed back on
> that
> > >> case (and its PR) because that test is for translating RelNode to a
> SQL
> > >> dialect, not about handling source dialects.
> > >>
> > >> Dialect-to-dialect translation is undoubtedly something that people do
> > >> with Calcite. I think we should recognize that fact, and document how
> > >> someone can use Calcite as a translator. When we have documented it,
> we
> > can
> > >> also add some tests.
> > >>
> > >> I am also worried about our dialect tests in general. The surface area
> > to
> > >> be tested is huge, and the tests are added haphazardly, so while many
> > cases
> > >> are tested there is a much greater set of cases that are not tested.
> > >> Consider, for example, how testSelectQueryWithGroupByEmpty [1] tests
> > >> against MySQL, Presto, StarRocks but not against BigQuery, Snowflake
> or
> > >> Postgres. If we want our SQL dialect support to be high quality, we
> > have to
> > >> find a way to improve the coverage of our tests. I logged
> > >> https://issues.apache.org/jira/browse/CALCITE-5529 with some ideas
> but
> > I
> > >> need help implement it.
> > >>
> > >> Julian
> > >>
> > >> [1]
> > >>
> >
> https://github.com/apache/calcite/blob/f2ec11fe7e23ecf2db903bc02c40609242993aad/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java#L577
> > >>
> > >>
> > >>
> >
>

Re: [DISCUSS] Calcite as SQL translator, and dialect testing

Reply via email to