paul-rogers opened a new issue, #12815: URL: https://github.com/apache/druid/issues/12815
## Overview Proposed is a set of refactoring steps to move Druid’s Guice usage from a set of ad-hoc routines into a set of builders, so that the mechanism can be used for a variety of use cases beyond the service-per-server approach which Druid uses today. ## Motivation Druid is designed to run as multiple services, each housing a single service. Druid uses Guice to manage dependencies. The code to populate Guice is “hard coded” to assume this service architecture. However, it turns out there are multiple other uses of the Druid code base, none of which are well served by the present implementation. * **Tests** either cobble together an injector or, more frequently, build up a set of objects using ad-hoc mechanisms. This has resulted in ever-more-complex bits of setup code which mimic what Guice does. This ad-hoc is hard to understand, fragile, and acts as a barrier to rapidly creating more tests. * **Clients**, such as the integration tests (ITs) use Druid code, but do not run services. They twist themselves into a pretzel trying to use the current server-focused code, without actually running a service. * **Single-process** is the idea that, like many other Apache projects, we want to run Druid’s many services within a single Java process, at least for unit testing. (Other projects call this a “mini-server” or some such name.) In this case, we will configure multiple services using the same injector, but the current code is not designed for this use case. ## Goal The goal, then is to retain the *functionality* of Guice, but change the *packaging* to all us to create multiple configurations: * Druid service-per-server * Unit tests (without running a service) * ITs (clients) * Unit tests (running multiple Druid services) * Others as we find the need ## Design The key challenge with the current code is that we hard-code the collection of modules, and we have private implementations of the mechanisms used to gather dependencies. The gist of this proposal is to refactor this code to be more open. ### Injector Builder A new `InjectorBuilder` provides an easy way to build up the set of dependencies for a given run. This is the most basic builder: it has no Druid assumptions and provides no default modules. ### Startup Injector Builder Druid uses a two-stage process to build dependencies. The first (or “startup”) stage builds an injector with basic dependencies: JSON, properties, and a few others. Then, the second (or “CLI”) stage defines the set of service-dependent modules, which themselves be injected with dependencies using the startup injector. The startup injector includes the core Druid configuration mechanisms: * The Jackson `ObjectMapper` for both JSON and Smile. * The JSON configuration system which builds config objects from properties * Null handling and expression handling. * A bridge to the primary (CLI) injector. The current startup injector then unfortunately adds two items which are used only in a server context: * Configuration files from the class path * Runtime information about memory, processors, etc. Since the above two items are not needed by clients or tests, they are encapsulated in a separate `forServer()` method called only by servers, leaving the "basic" injector ready for use in clients and tests. ### Druid Injector Builder Druid extends Guice in several ways: * The `DruidModule` class which adds Jackson modules to the `ObjectMapper` * Filtering of modules based on an exclude list from properties, and `LoadScope` annotations. * Inject dependencies from the startup injector into modules used for the primary injector. The Druid injector builder handles these extra features. When used in tests and clients, there may be no module exclude list present, nor any node roles. When run on a server, then this filtering will be applied. This class absorbs the `ModuleList` functionality currently in `Initialization` so that it can be used outside of the various `CLI` classes. ### Server-specific Builders Three builders combine to create the primary injector for a server: * `CoreInjectorBuilder` holds the list of modules previously listed in the `Initialization` class. * `ServiceInjectorBuilder` holds the list of service-specific modules. * `ExtensionInjectorBuilder` holds the list of extension modules obtained from extensions on the class path. These three builders provide overriding: later builders can override modules added in a previous builder. Combined, they replace the logic previously in `Initialization.makeInjectorWithModules()`. `CoreInjectorBuilder` can be used in tests. In this case, it provides only logging and the Druid lifecycle manager. Tests can add other modules as needed for that specific tests. (Tests that don't need either of these classes can use the startup injector builder.) ## Tests Tests currently include rather complex code to either attempt to use Guice to create objects, or to work around Guice by hand-wiring components. A key challenge, as noted above, is that the existing injector-construction code assumes a server environment; tests must then somehow work around the fact that tests are not, in fact, servers. This is particularly true in the "Calcite tests": there exists an elaborate set of ad-hoc code in `CalciteTests` to hand-wire a set of mock objects. The planner test PR struggled to refactor the code to allow more flexibility. A key motivation of this clean-up is to provide a framework that uses Guice to construct the Calcite test environment. More details will be added as they are worked out. ## Clients The integration tests (ITs) are essentially clients of the Druid server, but they use Guice to assemble various components needed for client functionality. The code to do this is quite complex and ad-hoc. The "new ITs" found it is also fragile, since the existing logic was designed for server, not client use. A key goal of this proposal is to allow the ITs (and, in particular, the new version) to use a more solid approach to using Druid code in a client. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
