TL;DR: We need to add software engineering processes to ensure that
Calcite is consistent with other SQL systems. What should those be?

There has been a lot of activity recently adding functions such as
LOG2 [1] and ensuring that existing functions such as LOG and SQRT [2]
are consistent with other implementations. Another case was about
whether Calcite should allow casting a VARCHAR to a VARBINARY [3].

These discussions have been time-consuming. For example, the LOG2 PR
received almost a hundred comments, and that is a function that should
be simple to specify and implement. If a DB did not have LOG2, instead
of LOG2(n) I would just write LOG(n) / LOG(2) and it would work.

In one of the discussions a statement was made that "SQL rules say
that SQRT should return NULL for negative values". I checked Oracle --
the definitive SQL implementation for many years -- and its doc says
that it returns NaN.

In another incident[4], the TO_CHAR function was removed from the
MySQL library and then added back when someone pointed out that it is
present in MariaDB, and our MySQL library covers MariaDB.

We just can't keep doing this. We need to quickly and efficiently
arrive at the correct specification for when Calcite is attempting to
emulate other SQL systems, and then we need to stay with that
specification. I propose that we use software engineering.

One idea is to create a Quidem script that can be run against MySQL,
and passes, and when run against Calcite-pretending-to-be-MySQL, it
also passes. Anyone adding to that script must test it against a MySQL
instance before merging to main. And similarly for other SQL dialects.

As we encounter underspecified areas - e.g. the behavior of SQRT when
applied to -0 - we can resolve them by adding to that script.

Does anyone have any ideas for how we can solve this using automated
tools, backed by changes to our process, rather than by endless
discussions?

(By the way, this discussion relates to libraries (the "fun"
parameter), conformance and our parser settings (the "lex" parameter,
quoted and unquoted casing, etc.). We could have a similar discussion
about dialects - the SQL generated by the JDBC adapter - but please
let's have that discussion in another thread.)

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-6224
[2] https://issues.apache.org/jira/browse/CALCITE-5638
[3] https://issues.apache.org/jira/browse/CALCITE-6210
[4] https://issues.apache.org/jira/browse/CALCITE-6222

Reply via email to