[
https://issues.apache.org/jira/browse/CALCITE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062581#comment-18062581
]
Julian Hyde commented on CALCITE-7420:
--------------------------------------
All 4 of these points represent decisions that I had to take; I have sympathy
for [~zabetak]'s position, but came down on the other side.
Regarding 1, test names. They already exist, and they convey a lot of
information and structure. It is a small ask for the person creating a new test
case to generate a unique, descriptive name.
Test names are a very useful way to tie the new organization of tests back to
the old. (Jira case numbers are optional and not unique.) That relationship has
already proven useful, because Claude has not always faithfully copied
before/after plans.
Regarding 2. I think that query execution results make tests more concrete to a
human (see [literate
programming|https://en.wikipedia.org/wiki/Literate_programming]). Yes, this not
a SQL execution test, and yes, the rule being tested is not necessarily even
used in producing the plan. Including output encourages authors to choose a
lucid, compact example to illustrate the rule... essentially creating a
tutorial.
The {{!ok}} command has already discovered a couple of bugs (e.g. CALCITE-7427
from
{{{}testSortRemoveConstantKeyWhenOrderByIsNull{}}}). And it is highlighting
differences between the mock schema and the real scott schema, which, in the
interests of simplification, we should eliminate.
Regarding 3. I mostly agree, except that some tests have more than one rule,
and some rules have very few tests (and are therefore tested in groups, see
e.g. {{{}reduce-expressions.iq{}}}).
Regarding 4. As the mantra goes, "As simple as possible, but no simpler".
Remember that this test was an API-based test, not a SQL test. API-based
testing lets you twiddle many, many knobs. If we had not converted many of
those knobs to flags for the {{!sub-plan}} command – twelve at last count – we
would not have been able to migrate many of the Java test methods to SQL. After
the PR lands, you're welcome to try to eliminate the flags, one at a time.
Lastly, please remember that this is *not* a SQL-based end-to-end test. That is
not what we need, and it is not what it should become. This is, and should
remain, a *unit test* for planner rules. Moving is to SQL/Quidem has some
significant benefits, but it is remarkable that it can be converted to SQL at
all.
> Convert RelOptRulesTest to Quidem scripts
> -----------------------------------------
>
> Key: CALCITE-7420
> URL: https://issues.apache.org/jira/browse/CALCITE-7420
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Assignee: Julian Hyde
> Priority: Major
> Labels: pull-request-available
>
> Many test cases in RelOptRulesTest are simple - involve just a SQL query, set
> of rules, before and after plan. If these were converted to Quidem scripts
> they would be easier to write (because you are writing just one file), more
> maintainable (because Quidem scripts merge more easily than XML), and could
> potentially be run from another language (which would enable ports of Calcite
> to language such as Rust or Python).
> Here is how one test would look:
> {noformat}
> # Pull aggregate through union ====================
> # Tests that AggregateUnionAggregateRule can convert
> # a union of two ’select distinct’ queries to a ’select distinct’
> # of a union.
> !set rules "AGGREGATE_UNION_AGGREGATE"
> select deptno, job
> from (select deptno, job
> from emp as e1
> group by deptno, job
> union all
> select deptno, job
> from emp as e2
> group by deptno, job)
> group by deptno, job;
> +--------+-----------+
> | DEPTNO | JOB |
> +--------+-----------+
> | 20 | CLERK |
> | 30 | SALESMAN |
> | 20 | MANAGER |
> | 30 | MANAGER |
> | 10 | MANAGER |
> | 20 | ANALYST |
> | 10 | PRESIDENT |
> | 30 | CLERK |
> | 10 | CLERK |
> +--------+-----------+
> (9 rows)
> !ok
> LogicalAggregate(group=[{0, 1}])
> LogicalUnion(all=[true])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> !plan-before
> LogicalAggregate(group=[{0, 1}])
> LogicalUnion(all=[true])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> !plan-after
> {noformat}
>
> It has the same information as the Java method
> {code:java}
> @Test void testPullAggregateThroughUnion() {
> final String sql = "select deptno, job from"
> + " (select deptno, job from emp as e1"
> + " group by deptno,job"
> + " union all"
> + " select deptno, job from emp as e2"
> + " group by deptno,job)"
> + " group by deptno,job";
> sql(sql)
> .withRule(CoreRules.AGGREGATE_UNION_AGGREGATE)
> .check();
> }
> {code}
> and XML resources
> {code:xml}
> <TestCase name="testPullAggregateThroughUnion">
> <Resource name="sql">
> <![CDATA[select deptno, job from (select deptno, job from emp as e1
> group by deptno,job union all select deptno, job from emp as e2 group by
> deptno,job) group by deptno,job]]>
> </Resource>
> <Resource name="planBefore">
> <![CDATA[
> LogicalAggregate(group=[{0, 1}])
> LogicalUnion(all=[true])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> ]]>
> </Resource>
> {code}
> It uses {{!set rules}} command (added in CALCITE-6335) and would require new
> {{!plan-before}} and {{!plan-after}} commands.
> I suspect that Claude would do a quite good job migrating tests from .java
> and .xml to .iq. I think it could also group tests by subject area (e.g. join
> and aggregate) so that .iq files are ~1,000 lines.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)