paul-rogers opened a new issue, #12815:
URL: https://github.com/apache/druid/issues/12815

   ## Overview
   
   Proposed is a set of refactoring steps to move Druid’s Guice usage from a 
set of ad-hoc routines into a set of builders, so that the mechanism can be 
used for a variety of use cases beyond the service-per-server approach which 
Druid uses today.
   
   ## Motivation
   
   Druid is designed to run as multiple services, each housing a single 
service. Druid uses Guice to manage dependencies. The code to populate Guice is 
“hard coded” to assume this service architecture.
   
   However, it turns out there are multiple other uses of the Druid code base, 
none of which are well served by the present implementation.
   
   * **Tests** either cobble together an injector or, more frequently, build up 
a set of objects using ad-hoc mechanisms. This has resulted in 
ever-more-complex bits of setup code which mimic what Guice does. This ad-hoc 
is hard to understand, fragile, and acts as a barrier to rapidly creating more 
tests.
   * **Clients**, such as the integration tests (ITs) use Druid code, but do 
not run services. They twist themselves into a pretzel trying to use the 
current server-focused code, without actually running a service.
   * **Single-process** is the idea that, like many other Apache projects, we 
want to run Druid’s many services within a single Java process, at least for 
unit testing. (Other projects call this a “mini-server” or some such name.) In 
this case, we will configure multiple services using the same injector, but the 
current code is not designed for this use case.
   
   ## Goal
   
   The goal, then is to retain the *functionality* of Guice, but change the 
*packaging* to all us to create multiple configurations:
   
   * Druid service-per-server
   * Unit tests (without running a service)
   * ITs (clients)
   * Unit tests (running multiple Druid services)
   * Others as we find the need
   
   ## Design
   
   The key challenge with the current code is that we hard-code the collection 
of modules, and we have private implementations of the mechanisms used to 
gather dependencies. The gist of this proposal is to refactor this code to be 
more open.
   
   ### Injector Builder
   
   A new `InjectorBuilder` provides an easy way to build up the set of 
dependencies for a given run. This is the most basic builder: it has no Druid 
assumptions and provides no default modules.
   
   ### Startup Injector Builder
   
   Druid uses a two-stage process to build dependencies. The first (or 
“startup”) stage builds an injector with basic dependencies: JSON, properties, 
and a few others. Then, the second (or “CLI”) stage defines the set of 
service-dependent modules, which themselves be injected with dependencies using 
the startup injector.
   
   The startup injector includes the core Druid configuration mechanisms:
   
   * The Jackson `ObjectMapper` for both JSON and Smile.
   * The JSON configuration system which builds config objects from properties
   * Null handling and expression handling.
   * A bridge to the primary (CLI) injector.
   
   The current startup injector then unfortunately adds two items which are 
used only in a server context:
   
   * Configuration files from the class path
   * Runtime information about memory, processors, etc.
   
   Since the above two items are not needed by clients or tests, they are 
encapsulated in a separate `forServer()` method called only by servers, leaving 
the "basic" injector ready for use in clients and tests.
   
   ### Druid Injector Builder
   
   Druid extends Guice in several ways:
   
   * The `DruidModule` class which adds Jackson modules to the `ObjectMapper`
   * Filtering of modules based on an exclude list from properties, and 
`LoadScope` annotations.
   * Inject dependencies from the startup injector into modules used for the 
primary injector.
   
   The Druid injector builder handles these extra features. When used in tests 
and clients, there may be no module exclude list present, nor any node roles. 
When run on a server, then this filtering will be applied.
   
   This class absorbs the `ModuleList` functionality currently in 
`Initialization` so that it can be used outside of the various `CLI` classes.
   
   ### Server-specific Builders
   
   Three builders combine to create the primary injector for a server:
   
   * `CoreInjectorBuilder` holds the list of modules previously listed in the 
`Initialization` class.
   * `ServiceInjectorBuilder` holds the list of service-specific modules.
   * `ExtensionInjectorBuilder` holds the list of extension modules obtained 
from extensions on the class path.
   
   These three builders provide overriding: later builders can override modules 
added in a previous builder. Combined, they replace the logic previously in 
`Initialization.makeInjectorWithModules()`.
   
   `CoreInjectorBuilder` can be used in tests. In this case, it provides only 
logging and the Druid lifecycle manager. Tests can add other modules as needed 
for that specific tests. (Tests that don't need either of these classes can use 
the startup injector builder.)
   
   ## Tests
   
   Tests currently include rather complex code to either attempt to use Guice 
to create objects, or to work around Guice by hand-wiring components. A key 
challenge, as noted above, is that the existing injector-construction code 
assumes a server environment; tests must then somehow work around the fact that 
tests are not, in fact, servers.
   
   This is particularly true in the "Calcite tests": there exists an elaborate 
set of ad-hoc code in `CalciteTests` to hand-wire a set of mock objects. The 
planner test PR struggled to refactor the code to allow more flexibility. A key 
motivation of this clean-up is to provide a framework that uses Guice to 
construct the Calcite test environment.
   
   More details will be added as they are worked out.
   
   ## Clients
   
   The integration tests (ITs) are essentially clients of the Druid server, but 
they use Guice to assemble various components needed for client functionality. 
The code to do this is quite complex and ad-hoc. The "new ITs" found it is also 
fragile, since the existing logic was designed for server, not client use.
   
   A key goal of this proposal is to allow the ITs (and, in particular, the new 
version) to use a more solid approach to using Druid code in a client.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to