paul-rogers opened a new pull request, #12545:
URL: https://github.com/apache/druid/pull/12545
### Description
SQL planners are complex and difficult to test. They are a black box: SQL
goes in, and results come out. From a testing perspective, a planner tends to
be a box of magic that either works or doesn’t. This PR provides a framework to
turn classic black-box planner testing into a white-box experience by capturing
and verifying the multiple planner artifacts. The framework is inspired by the
one used in Apache Impala.
The core idea is the “case” file which contains SQL statements, any required
query context or options, and expected results, serialized as text. Expected
results include the Calcite plan, the native query, the schema, the “physical
plan” (such as it is in Druid), and so on.
Creating tests is easy. This PR includes a “starter set” captured from the
various `Calcite?QueryTest` classes. To add a new one, add a test case with a
query, add dummy values for the desired artifacts and run the test. It will
fail, of course. Now, use your favorite diff tool to copy the “actual” results
to the “expected” results file (assuming the results are correct.) After that,
rerun the tests after each planner change to ensure nothing changes (except
whatever you specifically wanted to change.)
Tests of this sort are quite strict: change anything and the test will fail.
This can be painful if you just want to change something and are not concerned
about side-effects. It can, however, make changing the planner a breeze because
you get extreme visibility into the exact impact your change has: no surprises.
See the `README.md` file for full details of the case file syntax, how to
create a test, and how to convert existing test files.
## Example
The following shows a test converted from `CalciteQueryTest`, including the
various test case sections:
```text
==============================================================
Converted from testTopNLimitWrapping()
=== case
Top n limit wrapping
=== SQL
SELECT dim1, COUNT(*)
FROM druid.foo
GROUP BY dim1
ORDER BY dim1 DESC
=== context
sqlOuterLimit=2
=== schema
dim1 VARCHAR
EXPR$1 BIGINT
=== plan
LogicalSort(sort0=[$0], dir0=[DESC], fetch=[2:BIGINT])
LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
LogicalProject(dim1=[$2])
LogicalTableScan(table=[[druid, foo]])
=== native
{
"queryType" : "topN",
...
```
Multiple other sections are available: the above is one of the simpler cases.
## Running Queries
The`Calcite?QueryTest` cases verify the native query and results. Verifying
results is harder than one would expect: most tests run with two variations
(SQL and classic null handling) which produces different results. Others use
generators to run up to eight option variations for a total of 16 runs, with
differing results. Adding results to the test framework is thus non-trivial and
left as a second PR.
Since the new framework can’t check results, we are left with using both the
old and new versions of converted tests. This is unfortunate: a goal for a
second PR is to retire the “legacy” test files. A big bonus: someone had to
laboriously write the Java code to build up native queries for those tests. In
the figure, the computer will do it for us.
## PR Details
This section summarizes the many changes required to create the test
framework.
### Streamline Native Query Serialization
The test framework captures native queries in serialized JSON format.
Examining those queries identified a number of places where we omit trivial
fields: null values, default values, empty lists, and so on. This PR adds JSON
annotations to suppress such trivial fields. A couple of `CalciteQueryTest`
cases verify this format and were changed to reflect the new “lean” format.
The test framework requires that native query serialization be stable: the
same query is serialized in the same way every time. Fortunately, that is
already mostly true. The one exception found thus far is `InDimFilter` which
serializes a `Set` which produces a random serialization order. The
serialization function for that one property was changed to produce a sorted
result.
### Druid Planner State Capture
Planners want to be black boxes: query in, plan out. Tests want visibility
into the internal states as described above. The planner is modified to allow a
`PlannerStateCapture` instance which can record (capture) this internal state.
By default, the planner uses a "no op" capture. Tests provide a capturing
instance.
The code to parse SQL was repeated three times. Factored that into a
function.
Added the ability to capture the internal "execution plan" or "physical
plan" in the `QueryMaker` via a new `explain()` method. This method does
everything except send the query to another node for execution. The various
query makers are refactored to support the two paths (explain and execute). For
a SELECT, the physical plan is just the native plan with a few extra fields
sent to the Broker. For INSERT, it is whatever the particular INSERT extension
chooses to provide.
### Union Queries
As it turns out, SQL UNION ALL queries have no direct native query
representation: they turn into a list of native queries. Since we want to
serialize "the" native query, the framework creates a "fake" "union" native
query type on the fly to represent unions. Ugly, but it works.
### INSERT/REPLACE Statements
The INSERT and REPLACE support in the planner is a bit of a bolt-on. The
planner knows how to work with SELECT statements only, including if they are
wrapped in a DESCRIBE or INSERT/REPLACE. When exiting the Calcite logical plan,
the INSERT and REPLACE nodes don't appear. The test framework "invents" nodes
for these cases (in the test framework only) so the the logical plan makes
sense.
### Planner Config
The planner config class has grown to be a monster: it holds config not just
for the planner, but also for the segment metadata caching layer. This PR
resisted the temptation to fix that issue.
Instead, it fixes another issue: tests want to tinker with the planner
config via the "options" setting. But, the planner context is opaque: the only
way to create one is via JSON serialization. Modified the class to provide a
builder for use in tests.
The builder is then used in place of the detailed code in the
`withOverrides` method.
### Calcite Test Refactoring
`ExpressionProcessing` now allows a test to change the expression settings
on the fly. This lets the test case declare, via the "options" section, how it
wants to process expressions, allowing the variations to be tested in a single
run.
`Calcites` has a `escapeStringLiteral()` method which escapes string
literals, but is a bit too aggressive and results in a hard-to-read string with
all punctuation converted to Unicode escapes. A new method,
`escapeStringLiteralLenient()` is added to do the lightest-possible escaping
and results in much better output in the test cases.
The `BaseCalciteTest` class was refactored a bit to:
* Use the new `PlannerConfig.Builder` to create the various planner configs
rather than anonymous classes.
* Add support for planner state capture while running tests. Does nothing by
default; can be turned on to capture planner state when converting tests from
Java to the new framework. (This capture is distinct from the existing "hook"
mechanism to capture the native query while running the query.)
`RootSchemaBuilder` holds the code to build up Druid's Calcite root schema.
It is refactored to a builder so that tests can add to the root schema.
`CalciteTests` is refactored to use the above builder.
### Query Context
* Added more context variable key constants.
* A number of query context keys were found to be hard-coded in various
files and were changed to reference `QueryContext` constants instead.
* The `QueryContext` class has a set of handy `getBoolean()`, etc. methods,
but can only be used via that class. The code was pulled out into
`QueryContexts` so that the test classes can use it for test options.
* Pulled the `override` method logic out of `QueryContext` into
`QueryContexts` so that the test framework can use it on maps.
* Added a "metadata" section for options that lists the option key and the
expected type. Also includes the default, when known. This metadata allows the
case file to use Java properties syntax for any needed context variables:
```text
=== context
useApproximateTopN=false
```
### Druid Planner Test
The new `DruidPlannerTest` class, and its associated ".case" files holds
test converted from:
* `CalciteInsertDmlTest`
* `CalciteArrayQueriesTest`
* `CalciteCorrelatedQueryTest`
* `CalciteJoinQueryTest`
* `CalciteMultiValueStringQueryTest`
* `CalciteParameterQueryTest`
* `CalciteQueryTest`
This isn't all such tests, just enough to allow a particular project to
proceed with confidence. More conversions can be done later. And, as noted, the
test file does not currently run the queries: it only plans them and verifies
the artifacts.
The other classes in
`sql/src/test/java/org/apache/druid/sql/calcite/tester/` support the test
framework:
* `README.md` contains instructions for using the test framework.
* `PlannerFixture` is a kitchen-sink class that encapsulates the code to set
up a planner (and all its associated knick-knacks) to run a test. Other tests
could use this class rather than doing the setup themselves, but that
conversion is not done in this PR.
* `QueryTestCase` represents one planner test case and its sections.
* `ActualResults` records the actual results of a run, compares them with
the expected results, and writes a report if there are errors.
* `CalciteTestRecorder` is the implementation of the mechanism that captures
planner state when running JUnit test.
* `QueryTestCaseRunner` runs a test case (in the test framework) and
captures the actual results.
* `TestCaseLoader` simple parser for the test case file format.
* Other classes are lower-level helpers.
<hr>
This PR has:
- [X] been self-reviewed.
- [X] added documentation for new or modified features or behaviors.
- [X] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [X] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [X] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]