[GitHub] [druid] paul-rogers opened a new pull request, #12965: Modular Calcite Test Framework

GitBox Wed, 24 Aug 2022 15:46:53 -0700


paul-rogers opened a new pull request, #12965:
URL: https://github.com/apache/druid/pull/12965


   An upcoming PR will re-submit the [planner test 
framework](https://github.com/apache/druid/pull/12545). That framework requires 
more flexibility than we have today to customize the Planner and other aspects 
of the test setup. The original PR used a copy/paste approach to create a new 
`PlannerFixture` to allow building those components. This PR takes a "DRY" 
approach: rather than create a new copy, let's instead refactor the one we have 
so it can do the job for both the Calcite tests and the planner test framework.
   
   Note that this PR *does not* include the new test framework. That will come 
later to keep the PRs of reasonable size.
   
   The key changes here are:
   
   * Let the test create the injector as an alternative to always using 
`CalciteTests.INJECTOR`.
   * Let the tests add items to the framework in a way easier than subclassing 
`BaseCalciteTest`.
   * Split framework creation into a "class-wide" part that can be reused 
across tests, and a "test-specific" part.
   
   To achieve these goals, a number of refactoring moves were made.
   
   Before this PR, `CalciteTests` was a "kitchen sink" one-stop-shop to create 
the test data, conglomerates, segment walkers, planner factory, etc. Tough we 
create all the components per-test, they all depend on a static injector used 
for the entire class. Then, `BaseCalciteQueryTest` provides tools to customize: 
methods that can be overridden to customize the operator table, macro table, 
etc.
   
   The tests build up a set of "framework" items as fields. There is a single 
`CalciteTests.INJECTOR` used for common items. Then, for the parts that change 
per-test, we have code that, for every test, rebuilds the parts that vary. 
Those parts have dependencies: when the dependencies change (primarily the 
`PlannerConfig`), all other objects have to be recreated. The result is that we 
rebuild a large amount of the framework (including sample data) for every test 
method.
   
   ### Two-Phase Construction
   
   The refactoring breaks the "framework" components into two phases. The first 
creates the sample data, conglomerate and walker which are constant across 
tests. `BaseCalciteQueryTest` is modified to build this phase once per class 
(rather than once per test method as before.)
   
   The second phase builds the planner factory and associated objects per test. 
The driver here is `PlannerConfig` which several tests customize: anything that 
depends on `PlannerConfig` must be rebuilt including the root schema (which 
contains views which depends on the planner), the planner factory, and the 
"statement" factory (which uses the planner factory).
   
   ### Builders
   
   The planer tests want to run tests differently than the classic "Calcite 
query test" approach. To do that, we refactor framework creation into a series 
of builders which are independent of the `CalciteTests`/`BaseCalciteQueryTest` 
classes (but can be used by them.) Specifically, each of which accepts the 
injector to use rather than being hard-coded to use `CalciteTest.INJECTOR`.
   
   * `QueryFrameworkUtils` contains many of the low-level functions that build 
framework items. Each method accepts an injector. The `CalciteTests` versions 
still exist, but now just call the `QueryFrameworkUtils` versions with the 
`CalciteTests.INJECTOR`.
   * `QueryFramework` builds the entire framework, including conglomerate, 
walker, schema, planner factory, etc. To ensure backward compatibility, this 
class takes an interface, `QueryComponentSupplier` that has method names that 
match the existing `BaseCalciteQueryTest` methods so that tests need not change.
   * `TestDataBuilder` builds the test data. There is a large amount of code to 
build the segments, accessed by a single method. By pulling the code out into 
this class, it is easier to understand the remainder of `CalciteTests`, and to 
understand which method exist only to build the sample data.
   
   The `QueryFramework` is built per test class, and offers a 
`statementFactory()` method to build the per-test components. Objects that used 
to be fields of `BaseCalciteQueryTest` are now obtained from the framework 
instance.
   
   ### Split `PlannerConfig`
   
   It turns out that `PlannerConfig` holds properties used by the planner (as 
suggested by the name) _and_ for the Broker's segment metadata cache. Since 
`PlannerConfig` may change per test, we had to rebuild the segment cache per 
test.
   
   To solve this, `PlannerConfig` is split into two, with the segment cache 
values (only two) moving to a new `SegmentCacheMetadataConfig ` class. Both use 
the same base property, so there should be no change to behavior from the 
user's perspective. Since no test every changed the segment metadata config, we 
do not need to rebuild the segment metadata cache per test method.
   
   ### `testQuery()` methods
   
   All Calcite tests funnel though `testQuery()` methods. Thanks to a recent PR 
from @gianm, they funnel down to a single method that does the actual work. 
That method runs the query once or twice, depending on the "vectorize" 
settings. Each run uses the same planner (really, the same statement factory.)
   
   To avoid creating the same objects twice, we create the statement factory 
(and the root schema and the planner factory) before doing the vectorize-or-not 
loop, and reuse it for each query variation within that one test. 
   
   ### The Road Not Taken
   
   The above refactoring still uses the "do-it-yourself" approach to assembling 
the many objects within the framework. A longer-term goal would be to rely on 
Guice to build these objects so that we run tests closer to the way we run the 
actual Druid server (and as a step toward a single-process Druid.) As it turns 
out, there are many additional issues to address to take this additional step, 
and those are left as a future exercise. This PR moves us toward that goal, 
however, by putting the construction of the framework behind a single builder 
class.
   
   ### Testing
   
   Most of the code in this PR affects only tests. As a result, the Calcite 
query tests themselves are comprehensive tests of the above changes.
   
   <hr>
   
   This PR has:
   
   - [X] been self-reviewed.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers opened a new pull request, #12965: Modular Calcite Test Framework

Reply via email to