Re: [DISCUSS] Calcite as SQL translator, and dialect testing

Walaa Eldin Moustafa Wed, 27 Nov 2024 09:28:24 -0800

You may consider looking into Coral [1] which is based on Calcite, but
extends it with the translation requirements in mind, and as first class
problems.


[1] https://github.com/linkedin/coral

Thanks,
Walaa.


On Wed, Nov 27, 2024 at 9:20 AM Mihai Budiu <mbu...@gmail.com> wrote:

> I think that the task of translation between SQL dialects is hopeless.
> Consider just the type DECIMAL:
>
>
>   *
> Each dialect supports a different range of DECIMAL types, specified by
> scale and precision.
>   *
> Some dialects support negative scales, and some support scales > precision.
>   *
> Some dialects, like Postgres have infinite values in DECIMAL. This means
> that DECIMAL arithmetic can never cause a runtime error in Postgres, unlike
> all other dialects.
>   *
> The rounding rules for converting between DECIMAL and other types differ
> between dialects.
>   *
> The typing rules for DECIMAL arithmetic differ between dialects: what is
> the type of a/b when a and b are DECIMALs?
>   *
> The implicit typing of literals is different between dialects: is 1.0e0 a
> DECIMAL or some floating point value?
>
> The conclusion is that a SQL dialect can express many programs that cannot
> be expressed at all in other SQL dialects. What is a translator supposed to
> do with such programs?
>
> These are a set of problems for just one type, but when you consider the
> full set of types supported, the problem is exponentially larger.
>
> Just look at the differences between functions in dialects:
> https://calcite.apache.org/docs/reference.html Calcite has 4 versions of
> CONCAT_WS, with subtle semantic differences.
>
> Are you aware of any tool, anywhere that translates between 2 fixed SQL
> dialects while preserving the program's exact behavior in all cases?
> Building a tool for all possible dialects is again exponentially harder.
>
> Since we have established that exact semantics-preserving transformation
> of SQL programs between dialects is impossible, you need to set up a goal
> for the translation effort. What do you really hope to achieve? Then we can
> discuss about how it can be done.
>
> Mihai
>
> ________________________________
> From: Yanjing Wang <zhuangzixiao...@gmail.com>
> Sent: Wednesday, November 27, 2024 1:02 AM
> To: dev@calcite.apache.org <dev@calcite.apache.org>
> Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing
>
> Hello, this discussion has been ongoing for a week. Let's move it forward.
> Does anyone else have any suggestions?
>
> Yanjing Wang <zhuangzixiao...@gmail.com> 于2024年11月19日周二 14:47写道：
>
> > 1. RelToSqlConverterTest
> > the class name implies tests conversion from RelNode to SQL, but now its
> > RelNode comes from different dialects with target sql. it is difficult
> for
> > me to understand the test case
> >
> > @Test void testNullCollation() {
> >   final String query = "select * from \"product\" order by
> \"brand_name\"";
> >   final String expected = "SELECT *\n"
> >       + "FROM \"foodmart\".\"product\"\n"
> >       + "ORDER BY \"brand_name\"";
> >   final String sparkExpected = "SELECT *\n"
> >       + "FROM `foodmart`.`product`\n"
> >       + "ORDER BY `brand_name` NULLS LAST";
> >   sql(query)
> >       .withPresto().ok(expected)
> >       .withSpark().ok(sparkExpected);
> > }
> >
> >
> > Why does the spark sql have 'NULLS LAST' in the end? the information is
> > missing if we don't add source rel or source dialect.
> >
> > 2. Dialect-to-dialect translation
> > I think it's necessary, dialect translation and materialized view
> > substitution are common in big data domain, it would be beneficial to
> make
> > Calcite more user-friendly for these scenarios.
> > Could we create end-to-end test cases that start with the source SQL of
> > one dialect and end with the target SQL of another (or the same) dialect?
> > We could also include user-defined materialized views in the process and
> > perform result comparison.
> >
> > Julian Hyde <jhyde.apa...@gmail.com> 于2024年11月19日周二 07:21写道：
> >
> >> A recent case, https://issues.apache.org/jira/browse/CALCITE-6693, "Add
> >> Source SQL Dialect to RelToSqlConverterTest”, implies that people are
> using
> >> Calcite to translate SQL from dialect to another. The test wanted to
> test
> >> translating a SQL string from Presto to Redshift. I pushed back on that
> >> case (and its PR) because that test is for translating RelNode to a SQL
> >> dialect, not about handling source dialects.
> >>
> >> Dialect-to-dialect translation is undoubtedly something that people do
> >> with Calcite. I think we should recognize that fact, and document how
> >> someone can use Calcite as a translator. When we have documented it, we
> can
> >> also add some tests.
> >>
> >> I am also worried about our dialect tests in general. The surface area
> to
> >> be tested is huge, and the tests are added haphazardly, so while many
> cases
> >> are tested there is a much greater set of cases that are not tested.
> >> Consider, for example, how testSelectQueryWithGroupByEmpty [1] tests
> >> against MySQL, Presto, StarRocks but not against BigQuery, Snowflake or
> >> Postgres. If we want our SQL dialect support to be high quality, we
> have to
> >> find a way to improve the coverage of our tests. I logged
> >> https://issues.apache.org/jira/browse/CALCITE-5529 with some ideas but
> I
> >> need help implement it.
> >>
> >> Julian
> >>
> >> [1]
> >>
> https://github.com/apache/calcite/blob/f2ec11fe7e23ecf2db903bc02c40609242993aad/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java#L577
> >>
> >>
> >>
>

Re: [DISCUSS] Calcite as SQL translator, and dialect testing

Reply via email to