paul-rogers opened a new pull request, #12545:
URL: https://github.com/apache/druid/pull/12545

   ### Description
   
   SQL planners are complex and difficult to test. They are a black box: SQL 
goes in, and results come out. From a testing perspective, a planner tends to 
be a box of magic that either works or doesn’t. This PR provides a framework to 
turn classic black-box planner testing into a white-box experience by capturing 
and verifying the multiple planner artifacts. The framework is inspired by the 
one used in Apache Impala.
   
   The core idea is the “case” file which contains SQL statements, any required 
query context or options, and expected results, serialized as text. Expected 
results include the Calcite plan, the native query, the schema, the “physical 
plan” (such as it is in Druid), and so on.
   
   Creating tests is easy. This PR includes a “starter set” captured from the 
various `Calcite?QueryTest` classes. To add a new one, add a test case with a 
query, add dummy values for the desired artifacts and run the test. It will 
fail, of course. Now, use your favorite diff tool to copy the “actual” results 
to the “expected” results file (assuming the results are correct.) After that, 
rerun the tests after each planner change to ensure nothing changes (except 
whatever you specifically wanted to change.)
   
   Tests of this sort are quite strict: change anything and the test will fail. 
This can be painful if you just want to change something and are not concerned 
about side-effects. It can, however, make changing the planner a breeze because 
you get extreme visibility into the exact impact your change has: no surprises.
   
   See the `README.md` file for full details of the case file syntax, how to 
create a test, and how to convert existing test files.
   
   ## Example
   
   The following shows a test converted from `CalciteQueryTest`, including the 
various test case sections:
   
   ```text
   ==============================================================
   Converted from testTopNLimitWrapping()
   === case
   Top n limit wrapping
   === SQL
   SELECT dim1, COUNT(*)
   FROM druid.foo
   GROUP BY dim1
   ORDER BY dim1 DESC
   === context
   sqlOuterLimit=2
   === schema
   dim1 VARCHAR
   EXPR$1 BIGINT
   === plan
   LogicalSort(sort0=[$0], dir0=[DESC], fetch=[2:BIGINT])
     LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
       LogicalProject(dim1=[$2])
         LogicalTableScan(table=[[druid, foo]])
   === native
   {
     "queryType" : "topN",
     ...
   ```
   
   Multiple other sections are available: the above is one of the simpler cases.
   
   ## Running Queries
   
   The`Calcite?QueryTest` cases verify the native query and results. Verifying 
results is harder than one would expect: most tests run with two variations 
(SQL and classic null handling) which produces different results. Others use 
generators to run up to eight option variations for a total of 16 runs, with 
differing results. Adding results to the test framework is thus non-trivial and 
left as a second PR.
   
   Since the new framework can’t check results, we are left with using both the 
old and new versions of converted tests. This is unfortunate: a goal for a 
second PR is to retire the “legacy” test files. A big bonus: someone had to 
laboriously write the Java code to build up native queries for those tests. In 
the figure, the computer will do it for us.
   
   ## PR Details
   
   This section summarizes the many changes required to create the test 
framework.
   
   ### Streamline Native Query Serialization
   
   The test framework captures native queries in serialized JSON format. 
Examining those queries identified a number of places where we omit trivial 
fields: null values, default values, empty lists, and so on. This PR adds JSON 
annotations to suppress such trivial fields. A couple of `CalciteQueryTest` 
cases verify this format and were changed to reflect the new “lean” format.
   
   The test framework requires that native query serialization be stable: the 
same query is serialized in the same way every time. Fortunately, that is 
already mostly true. The one exception found thus far is `InDimFilter` which 
serializes a `Set` which produces a random serialization order. The 
serialization function for that one property was changed to produce a sorted 
result.
   
   ### Druid Planner State Capture
   
   Planners want to be black boxes: query in, plan out. Tests want visibility 
into the internal states as described above. The planner is modified to allow a 
`PlannerStateCapture` instance which can record (capture) this internal state. 
By default, the planner uses a "no op" capture. Tests provide a capturing 
instance.
   
   The code to parse SQL was repeated three times. Factored that into a 
function.
   
   Added the ability to capture the internal "execution plan" or "physical 
plan" in the `QueryMaker` via a new `explain()` method. This method does 
everything except send the query to another node for execution. The various 
query makers are refactored to support the two paths (explain and execute). For 
a SELECT, the physical plan is just the native plan with a few extra fields 
sent to the Broker. For INSERT, it is whatever the particular INSERT extension 
chooses to provide.
   
   ### Union Queries
   
   As it turns out, SQL UNION ALL queries have no direct native query 
representation: they turn into a list of native queries. Since we want to 
serialize "the" native query, the framework creates a "fake" "union" native 
query type on the fly to represent unions. Ugly, but it works.
   
   ### INSERT/REPLACE Statements
   
   The INSERT and REPLACE support in the planner is a bit of a bolt-on. The 
planner knows how to work with SELECT statements only, including if they are 
wrapped in a DESCRIBE or INSERT/REPLACE. When exiting the Calcite logical plan, 
the INSERT and REPLACE nodes don't appear. The test framework "invents" nodes 
for these cases (in the test framework only) so the the logical plan makes 
sense.
   
   ### Planner Config
   
   The planner config class has grown to be a monster: it holds config not just 
for the planner, but also for the segment metadata caching layer. This PR 
resisted the temptation to fix that issue.
   
   Instead, it fixes another issue: tests want to tinker with the planner 
config via the "options" setting. But, the planner context is opaque: the only 
way to create one is via JSON serialization. Modified the class to provide a 
builder for use in tests. 
   
   The builder is then used in place of the detailed code in the 
`withOverrides` method.
   
   ### Calcite Test Refactoring
   
   `ExpressionProcessing` now allows a test to change the expression settings 
on the fly. This lets the test case declare, via the "options" section, how it 
wants to process expressions, allowing the variations to be tested in a single 
run.
   
   `Calcites` has a `escapeStringLiteral()` method which escapes string 
literals, but is a bit too aggressive and results in a hard-to-read string with 
all punctuation converted to Unicode escapes. A new method, 
`escapeStringLiteralLenient()` is added to do the lightest-possible escaping 
and results in much better output in the test cases.
    
   The `BaseCalciteTest` class was refactored a bit to:
   
   * Use the new `PlannerConfig.Builder` to create the various planner configs 
rather than anonymous classes.
   * Add support for planner state capture while running tests. Does nothing by 
default; can be turned on to capture planner state when converting tests from 
Java to the new framework. (This capture is distinct from the existing "hook" 
mechanism to capture the native query while running the query.)
   
   `RootSchemaBuilder` holds the code to build up Druid's Calcite root schema. 
It is refactored to a builder so that tests can add to the root schema.
   `CalciteTests` is refactored to use the above builder.
   
   ### Query Context
   
   * Added more context variable key constants.
   * A number of query context keys were found to be hard-coded in various 
files and were changed to reference `QueryContext` constants instead.
   * The `QueryContext` class has a set of handy `getBoolean()`, etc. methods, 
but can only be used via that class. The code was pulled out into 
`QueryContexts` so that the test classes can use it for test options.
   * Pulled the `override` method logic out of `QueryContext` into 
`QueryContexts` so that the test framework can use it on maps.
   * Added a "metadata" section for options that lists the option key and the 
expected type. Also includes the default, when known. This metadata allows the 
case file to use Java properties syntax for any needed context variables:
   
   ```text
   === context
   useApproximateTopN=false
   ```
   
   ### Druid Planner Test
   
   The new `DruidPlannerTest` class, and its associated ".case" files holds 
test converted from:
   
   * `CalciteInsertDmlTest`
   * `CalciteArrayQueriesTest`
   * `CalciteCorrelatedQueryTest`
   * `CalciteJoinQueryTest`
   * `CalciteMultiValueStringQueryTest`
   * `CalciteParameterQueryTest`
   * `CalciteQueryTest`
   
   This isn't all such tests, just enough to allow a particular project to 
proceed with confidence. More conversions can be done later. And, as noted, the 
test file does not currently run the queries: it only plans them and verifies 
the artifacts.
   
   The other classes in 
`sql/src/test/java/org/apache/druid/sql/calcite/tester/` support the test 
framework:
   
   * `README.md` contains instructions for using the test framework.
   * `PlannerFixture` is a kitchen-sink class that encapsulates the code to set 
up a planner (and all its associated knick-knacks) to run a test. Other tests 
could use this class rather than doing the setup themselves, but that 
conversion is not done in this PR.
   * `QueryTestCase` represents one planner test case and its sections.
   * `ActualResults` records the actual results of a run, compares them with 
the expected results, and writes a report if there are errors.
   * `CalciteTestRecorder` is the implementation of the mechanism that captures 
planner state when running JUnit test.
   * `QueryTestCaseRunner` runs a test case (in the test framework) and 
captures the actual results.
   * `TestCaseLoader` simple parser for the test case file format.
   * Other classes are lower-level helpers.
   
   <hr>
   
   This PR has:
   - [X] been self-reviewed.
   - [X] added documentation for new or modified features or behaviors.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to