[
https://issues.apache.org/jira/browse/CALCITE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062454#comment-18062454
]
Stamatis Zampetakis commented on CALCITE-7420:
----------------------------------------------
I went through the draft PR shared by Julian and I really like the redesign of
the testing infrastructure. Worth mentioning that is even more ambitious and
complete than what I had in mind as a potential V1 so I am very eager to see
this past the finish line.
I outline below a few minor comments/suggestions that may be worth considering
but not really blocking (must-have) for moving this work forward.
1. Drop explicit test name from the Quidem files
{noformat}
# testAggregateConstantKeyRule -------------------------------------------------
{noformat}
In Java, we have no way to add a test without an explicit name but in Quidem we
don't need it and just increases the boilerplate in the test file. A free text
description about the purpose of the test is sufficient and much more
descriptive. If we want to standardize or separate the test cases we can use
the JIRA ID followed by the description of the test.
{noformat}
# [CALCITE-1023] AGGREGATE_ANY_PULL_UP_CONSTANTS removes constant group keys
and adds a Project above to restore them.
{noformat}
2. Drop results (ok!) from rule tests
Certainly there are many benefits in running the query and seeing the results
but by doing so we are shifting from unit tests to end-to-end tests. The tests
become slower and much more verbose and we may have to deal with flakiness and
result determinism (order, etc.) much more often. Plus, the query results are
completely irrelevant with the rule(s) under test so there is no obvious reason
to have them together. In various cases, when adding new rules we have small
variations of the same query so running each and every one of them may not be
that useful.
Having said that, I also acknowledge all the positives of having runnable
queries so if the majority finds them useful I am perfectly fine to adopt this
pattern.
3. Test grouping/naming
Each test is usually targeting one specific rule. For this reason, I would
suggest to have 1-1 mapping between a rule and its respective test file. For
the test file we can use kebab casing using the rule's name without the Rule
suffix. That way it becomes straightforward where to add a new test case and
where to find the tests for a particular rule. The PR already follows this
pattern to some extend. However, there are some tests that seem misplaced. For
instance, {{testFilterProjectTransposeRule2}} targeting the
{{FilterProjectTransposeRule}} is currently inside {{filter.iq}} file while at
the same time there exists a {{{}filter-project-transpose.iq{}}}. At some point
there are gonna be test cases that target more than one rule and there we will
have to figure out a name on a per-case basis.
4. Consolidate usages of sub-plan, set hep-rules, set planner-rules
The addition of the new {{sub-plan}} command has some things in common with the
existing mechanism of configuring rules with the {{set}} command. Ideally, we
shouldn't need to maintain all these variants that closely resemble each other.
The consolidation can certainly be done in a follow-up although if we had one
option to begin with it would save us from additional cycles of review and
refactoring discussions.
{noformat}
!sub-plan "relBuilderSimplify=false, trim=true, subQueryRules, NONE"
{noformat}
Aspects like simplification, trimming, and other, could be decoupled from the
sub-plan command and become general properties that take effect via the
property handler mechanism. From a quick look in {{QuidemTest}} it seems that
we already have entries such as "expand", "insubquerythreshold", "trimfields".
For "subQueryRules", instead of having a dedicated instruction we could
possibly enhance {{QuidemTest.getCoreRule}} to handle via reflection
Collection/Array fields and then have the respective rule collection defined in
{{CoreRules}} on another class.
> Convert RelOptRulesTest to Quidem scripts
> -----------------------------------------
>
> Key: CALCITE-7420
> URL: https://issues.apache.org/jira/browse/CALCITE-7420
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Assignee: Julian Hyde
> Priority: Major
> Labels: pull-request-available
>
> Many test cases in RelOptRulesTest are simple - involve just a SQL query, set
> of rules, before and after plan. If these were converted to Quidem scripts
> they would be easier to write (because you are writing just one file), more
> maintainable (because Quidem scripts merge more easily than XML), and could
> potentially be run from another language (which would enable ports of Calcite
> to language such as Rust or Python).
> Here is how one test would look:
> {noformat}
> # Pull aggregate through union ====================
> # Tests that AggregateUnionAggregateRule can convert
> # a union of two ’select distinct’ queries to a ’select distinct’
> # of a union.
> !set rules "AGGREGATE_UNION_AGGREGATE"
> select deptno, job
> from (select deptno, job
> from emp as e1
> group by deptno, job
> union all
> select deptno, job
> from emp as e2
> group by deptno, job)
> group by deptno, job;
> +--------+-----------+
> | DEPTNO | JOB |
> +--------+-----------+
> | 20 | CLERK |
> | 30 | SALESMAN |
> | 20 | MANAGER |
> | 30 | MANAGER |
> | 10 | MANAGER |
> | 20 | ANALYST |
> | 10 | PRESIDENT |
> | 30 | CLERK |
> | 10 | CLERK |
> +--------+-----------+
> (9 rows)
> !ok
> LogicalAggregate(group=[{0, 1}])
> LogicalUnion(all=[true])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> !plan-before
> LogicalAggregate(group=[{0, 1}])
> LogicalUnion(all=[true])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> !plan-after
> {noformat}
>
> It has the same information as the Java method
> {code:java}
> @Test void testPullAggregateThroughUnion() {
> final String sql = "select deptno, job from"
> + " (select deptno, job from emp as e1"
> + " group by deptno,job"
> + " union all"
> + " select deptno, job from emp as e2"
> + " group by deptno,job)"
> + " group by deptno,job";
> sql(sql)
> .withRule(CoreRules.AGGREGATE_UNION_AGGREGATE)
> .check();
> }
> {code}
> and XML resources
> {code:xml}
> <TestCase name="testPullAggregateThroughUnion">
> <Resource name="sql">
> <![CDATA[select deptno, job from (select deptno, job from emp as e1
> group by deptno,job union all select deptno, job from emp as e2 group by
> deptno,job) group by deptno,job]]>
> </Resource>
> <Resource name="planBefore">
> <![CDATA[
> LogicalAggregate(group=[{0, 1}])
> LogicalUnion(all=[true])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0, 1}])
> LogicalProject(DEPTNO=[$7], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> ]]>
> </Resource>
> {code}
> It uses {{!set rules}} command (added in CALCITE-6335) and would require new
> {{!plan-before}} and {{!plan-after}} commands.
> I suspect that Claude would do a quite good job migrating tests from .java
> and .xml to .iq. I think it could also group tests by subject area (e.g. join
> and aggregate) so that .iq files are ~1,000 lines.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)