Re: Fragmentation of dialect tests

Julian Hyde Sat, 15 Feb 2025 17:10:11 -0800

Thank you, everyone, for your words of support. It means a lot to me, when I 
jump into a big refactoring, to know that people approve of the direction and 
are prepared to help.

Step 1 is nearly complete: add a ‘done()’ call to each test in 
RelToSqlConverterTest so that it validates each statement against the Calcite 
‘reference dialect’, and potentially loops through other enabled dialects.

(A small fraction of statements are disabled, because a bug prevents Calcite 
from validating or executing. I have logged bugs for some of these, and I am 
grateful that people have started working on these bugs. If you supply PRs, I 
will rebase them onto my branch.)

Step 2 will be to validate against one dialect (Postgres), and make sure that 
Postgres achieves the same result as the reference dialect. Running in this 
mode will require a local Postgres instance, so will be disabled in main.

Steps 2.1, 2.2, 2.3 will be to enable other dialects (Starrocks, MySQL, 
BigQuery, … contributions welcome, but let’s get Postgres done first.)

Step 3 will be to use Quidem recordings [1] so that you can run against 
Postgres (or any supported dialect) but not require Postgres to be present.

Step 4 will migrate tests, where possible, from the Foodmart data set to a 
smaller data set such as Scott (emp and dept). Foodmart's tables are rather 
large for dialect testing. For example, there are queries like ’select 
substr(‘abc’, 1, 1) from product’ that returns the same value 10,000 times. 
Step 4 can happen before or after step 3.

Step 5 will be to apply this process to other tests. SqlOperatorTest is 
particularly exciting because it contains thousands of expressions and expected 
results. If can get SqlOperatorTest passing against half a dozen dialects we 
can truly claim to have a SQL-to-SQL translator for our huge set of SQL 
functions.

Julian

[1] https://github.com/hydromatic/quidem/issues/80 

> On Feb 15, 2025, at 6:09 AM, Cancai Cai <caic68...@gmail.com> wrote:
> 
> +1，very good proposal, really cool.
> 
> Thank you for doing these. After merging your first PR, I will see
> what I can do.
> 
> Maybe we can open a jira case and split the subtasks to claim it.
> 
> Best wishes,
> Cancai Cai
> 
>> 2025年2月15日 18:07，Alessandro Solimando <alessandro.solima...@gmail.com> 写道：
>> 
>> We have too many dialect tests (RelToSqlConverterTest alone is 10,000
>> lines long) and we have too few (for many functions and operators, we
>> test the translation to SQL in only one or two dialects, and we never
>> actually execute that SQL).
>> 
>> I have a suggestion that I think will make a big difference: reduce
>> the fragmentation in RelToSqlConverterTest [1] by making one method
>> translate to several dialects.
>> 
>> For example, testSubstring [2] is a good example (it tests the
>> SUBSTRING function in 11 dialects) but it is accompanied by bad
>> examples (testSubstringInSpark [3] and testHiveSubstring [4] duplicate
>> the functionality of testSubstring for one dialect each).
>> 
>> This is an example of "tragedy of the commons". Many contributors have
>> an incentive to improve translation for just one dialect, but less
>> incentive to make the dialect system better for everyone. The solution
>> is for committers to force alignment - by requiring that contributors
>> refactor existing tests rather than always adding new ones. I know
>> it's no fun to be the 'bad cop', but cops tend to be necessary to
>> allow the commons to prosper.
>> 
>> I am working on radically improving the dialect tests [4][5] - so that
>> each test generates SQL for, and optionally executes against, the
>> dozen or so 'core dialects' - and rationalizing the existing tests
>> will complement that change. I would love people's help refactoring
>> the tests after my first commit is merged, but please don't make any
>> major changes to RelToSqlConverterTest before that merge has happened.
>

Re: Fragmentation of dialect tests

Reply via email to