paul-rogers opened a new pull request, #12965: URL: https://github.com/apache/druid/pull/12965
An upcoming PR will re-submit the [planner test framework](https://github.com/apache/druid/pull/12545). That framework requires more flexibility than we have today to customize the Planner and other aspects of the test setup. The original PR used a copy/paste approach to create a new `PlannerFixture` to allow building those components. This PR takes a "DRY" approach: rather than create a new copy, let's instead refactor the one we have so it can do the job for both the Calcite tests and the planner test framework. Note that this PR *does not* include the new test framework. That will come later to keep the PRs of reasonable size. The key changes here are: * Let the test create the injector as an alternative to always using `CalciteTests.INJECTOR`. * Let the tests add items to the framework in a way easier than subclassing `BaseCalciteTest`. * Split framework creation into a "class-wide" part that can be reused across tests, and a "test-specific" part. To achieve these goals, a number of refactoring moves were made. Before this PR, `CalciteTests` was a "kitchen sink" one-stop-shop to create the test data, conglomerates, segment walkers, planner factory, etc. Tough we create all the components per-test, they all depend on a static injector used for the entire class. Then, `BaseCalciteQueryTest` provides tools to customize: methods that can be overridden to customize the operator table, macro table, etc. The tests build up a set of "framework" items as fields. There is a single `CalciteTests.INJECTOR` used for common items. Then, for the parts that change per-test, we have code that, for every test, rebuilds the parts that vary. Those parts have dependencies: when the dependencies change (primarily the `PlannerConfig`), all other objects have to be recreated. The result is that we rebuild a large amount of the framework (including sample data) for every test method. ### Two-Phase Construction The refactoring breaks the "framework" components into two phases. The first creates the sample data, conglomerate and walker which are constant across tests. `BaseCalciteQueryTest` is modified to build this phase once per class (rather than once per test method as before.) The second phase builds the planner factory and associated objects per test. The driver here is `PlannerConfig` which several tests customize: anything that depends on `PlannerConfig` must be rebuilt including the root schema (which contains views which depends on the planner), the planner factory, and the "statement" factory (which uses the planner factory). ### Builders The planer tests want to run tests differently than the classic "Calcite query test" approach. To do that, we refactor framework creation into a series of builders which are independent of the `CalciteTests`/`BaseCalciteQueryTest` classes (but can be used by them.) Specifically, each of which accepts the injector to use rather than being hard-coded to use `CalciteTest.INJECTOR`. * `QueryFrameworkUtils` contains many of the low-level functions that build framework items. Each method accepts an injector. The `CalciteTests` versions still exist, but now just call the `QueryFrameworkUtils` versions with the `CalciteTests.INJECTOR`. * `QueryFramework` builds the entire framework, including conglomerate, walker, schema, planner factory, etc. To ensure backward compatibility, this class takes an interface, `QueryComponentSupplier` that has method names that match the existing `BaseCalciteQueryTest` methods so that tests need not change. * `TestDataBuilder` builds the test data. There is a large amount of code to build the segments, accessed by a single method. By pulling the code out into this class, it is easier to understand the remainder of `CalciteTests`, and to understand which method exist only to build the sample data. The `QueryFramework` is built per test class, and offers a `statementFactory()` method to build the per-test components. Objects that used to be fields of `BaseCalciteQueryTest` are now obtained from the framework instance. ### Split `PlannerConfig` It turns out that `PlannerConfig` holds properties used by the planner (as suggested by the name) _and_ for the Broker's segment metadata cache. Since `PlannerConfig` may change per test, we had to rebuild the segment cache per test. To solve this, `PlannerConfig` is split into two, with the segment cache values (only two) moving to a new `SegmentCacheMetadataConfig ` class. Both use the same base property, so there should be no change to behavior from the user's perspective. Since no test every changed the segment metadata config, we do not need to rebuild the segment metadata cache per test method. ### `testQuery()` methods All Calcite tests funnel though `testQuery()` methods. Thanks to a recent PR from @gianm, they funnel down to a single method that does the actual work. That method runs the query once or twice, depending on the "vectorize" settings. Each run uses the same planner (really, the same statement factory.) To avoid creating the same objects twice, we create the statement factory (and the root schema and the planner factory) before doing the vectorize-or-not loop, and reuse it for each query variation within that one test. ### The Road Not Taken The above refactoring still uses the "do-it-yourself" approach to assembling the many objects within the framework. A longer-term goal would be to rely on Guice to build these objects so that we run tests closer to the way we run the actual Druid server (and as a step toward a single-process Druid.) As it turns out, there are many additional issues to address to take this additional step, and those are left as a future exercise. This PR moves us toward that goal, however, by putting the construction of the framework behind a single builder class. ### Testing Most of the code in this PR affects only tests. As a result, the Calcite query tests themselves are comprehensive tests of the above changes. <hr> This PR has: - [X] been self-reviewed. - [X] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
