You may consider looking into Coral [1] which is based on Calcite, but extends it with the translation requirements in mind, and as first class problems.
[1] https://github.com/linkedin/coral Thanks, Walaa. On Wed, Nov 27, 2024 at 9:20 AM Mihai Budiu <mbu...@gmail.com> wrote: > I think that the task of translation between SQL dialects is hopeless. > Consider just the type DECIMAL: > > > * > Each dialect supports a different range of DECIMAL types, specified by > scale and precision. > * > Some dialects support negative scales, and some support scales > precision. > * > Some dialects, like Postgres have infinite values in DECIMAL. This means > that DECIMAL arithmetic can never cause a runtime error in Postgres, unlike > all other dialects. > * > The rounding rules for converting between DECIMAL and other types differ > between dialects. > * > The typing rules for DECIMAL arithmetic differ between dialects: what is > the type of a/b when a and b are DECIMALs? > * > The implicit typing of literals is different between dialects: is 1.0e0 a > DECIMAL or some floating point value? > > The conclusion is that a SQL dialect can express many programs that cannot > be expressed at all in other SQL dialects. What is a translator supposed to > do with such programs? > > These are a set of problems for just one type, but when you consider the > full set of types supported, the problem is exponentially larger. > > Just look at the differences between functions in dialects: > https://calcite.apache.org/docs/reference.html Calcite has 4 versions of > CONCAT_WS, with subtle semantic differences. > > Are you aware of any tool, anywhere that translates between 2 fixed SQL > dialects while preserving the program's exact behavior in all cases? > Building a tool for all possible dialects is again exponentially harder. > > Since we have established that exact semantics-preserving transformation > of SQL programs between dialects is impossible, you need to set up a goal > for the translation effort. What do you really hope to achieve? Then we can > discuss about how it can be done. > > Mihai > > ________________________________ > From: Yanjing Wang <zhuangzixiao...@gmail.com> > Sent: Wednesday, November 27, 2024 1:02 AM > To: dev@calcite.apache.org <dev@calcite.apache.org> > Subject: Re: [DISCUSS] Calcite as SQL translator, and dialect testing > > Hello, this discussion has been ongoing for a week. Let's move it forward. > Does anyone else have any suggestions? > > Yanjing Wang <zhuangzixiao...@gmail.com> 于2024年11月19日周二 14:47写道: > > > 1. RelToSqlConverterTest > > the class name implies tests conversion from RelNode to SQL, but now its > > RelNode comes from different dialects with target sql. it is difficult > for > > me to understand the test case > > > > @Test void testNullCollation() { > > final String query = "select * from \"product\" order by > \"brand_name\""; > > final String expected = "SELECT *\n" > > + "FROM \"foodmart\".\"product\"\n" > > + "ORDER BY \"brand_name\""; > > final String sparkExpected = "SELECT *\n" > > + "FROM `foodmart`.`product`\n" > > + "ORDER BY `brand_name` NULLS LAST"; > > sql(query) > > .withPresto().ok(expected) > > .withSpark().ok(sparkExpected); > > } > > > > > > Why does the spark sql have 'NULLS LAST' in the end? the information is > > missing if we don't add source rel or source dialect. > > > > 2. Dialect-to-dialect translation > > I think it's necessary, dialect translation and materialized view > > substitution are common in big data domain, it would be beneficial to > make > > Calcite more user-friendly for these scenarios. > > Could we create end-to-end test cases that start with the source SQL of > > one dialect and end with the target SQL of another (or the same) dialect? > > We could also include user-defined materialized views in the process and > > perform result comparison. > > > > Julian Hyde <jhyde.apa...@gmail.com> 于2024年11月19日周二 07:21写道: > > > >> A recent case, https://issues.apache.org/jira/browse/CALCITE-6693, "Add > >> Source SQL Dialect to RelToSqlConverterTest”, implies that people are > using > >> Calcite to translate SQL from dialect to another. The test wanted to > test > >> translating a SQL string from Presto to Redshift. I pushed back on that > >> case (and its PR) because that test is for translating RelNode to a SQL > >> dialect, not about handling source dialects. > >> > >> Dialect-to-dialect translation is undoubtedly something that people do > >> with Calcite. I think we should recognize that fact, and document how > >> someone can use Calcite as a translator. When we have documented it, we > can > >> also add some tests. > >> > >> I am also worried about our dialect tests in general. The surface area > to > >> be tested is huge, and the tests are added haphazardly, so while many > cases > >> are tested there is a much greater set of cases that are not tested. > >> Consider, for example, how testSelectQueryWithGroupByEmpty [1] tests > >> against MySQL, Presto, StarRocks but not against BigQuery, Snowflake or > >> Postgres. If we want our SQL dialect support to be high quality, we > have to > >> find a way to improve the coverage of our tests. I logged > >> https://issues.apache.org/jira/browse/CALCITE-5529 with some ideas but > I > >> need help implement it. > >> > >> Julian > >> > >> [1] > >> > https://github.com/apache/calcite/blob/f2ec11fe7e23ecf2db903bc02c40609242993aad/core/src/test/java/org/apache/calcite/rel/rel2sql/RelToSqlConverterTest.java#L577 > >> > >> > >> >