paul-rogers opened a new pull request, #12641: URL: https://github.com/apache/druid/pull/12641
Issue #11933 proposed using the industry-standard operator DAG structure for Druid queries in place of the existing Sequence-based approach. The issue has a lengthy discussion of the reasons. Separately, issue #12262 proposes a multi-stage query engine for Druid, focused on long-running report-style queries and ingestion. To extend that idea to the low-latency space would seem to demand we start with what we already have, and which has proven itself to be rock-solid in many production shops. Putting the two together, to create a multi-stage solution for Druid's low-latency query path, we propose to evolve what we have, step-by-step, to the industry-standard operator DAG approach, which will allow us to introduce multi-stage queries within the existing framework. This PR is a first step: it provides the foundation structure. The code here has already been used to create a full operator-based solution for [scan queries in the context of the historical node](https://github.com/paul-rogers/druid/tree/op-step1) and to fully convert the scan query path for the [test query stack](https://github.com/paul-rogers/druid/tree/op-step2). That work will be contributed, step-by-step, building on top of this PR. See [the README](https://github.com/paul-rogers/druid/blob/20942c83c23d7bae516bace80fdf07b3603067a5/processing/src/main/java/org/apache/druid/queryng/README.md) for more details. ### Operators An operator does one task in a data pipeline. The key operator abstractions include: `Operator`: an interface for a data pipeline component. An operator can be opened to provide an iterator over results, then closed. An operator can have zero inputs (a leaf operator), one input (a filter, limit or projection operator) or multiple inputs (join, merge, union, etc.) Multiple variations of operators are provided in this PR. All of these operators are simple in the sense that they only refer to other operators, but not to any of Druid's query infrastructure. * `LimitOperator`: applies a limit to a result set. * `NullOperator`: does nothing, like an empty list or empty iterator. * `MappingOperator`: takes one input and applies some form of mapping as defined by a derived class. * `ConcatOpreator`: performs a union of its inputs, emitting each one after the other. * `OrderedMergeOperator` implements an ordered merge of multiple inputs. * `WrappingOperator` similar to "baggage" on sequences: an operator that does tasks at the start and end, of result set, but imposes no per-row overhead. ### Fragments Operators combine to form a data pipeline. Data pipelines are distributed, as in Druid's scatter/gather architecture. A common terminology is to say that the entire query forms a DAG. The DAG is "sliced" at node boundaries, with exchanges between slices. At runtime, a *slice* is replicated across many nodes. Each instance of a slice is a *fragment*. This PR provides the basics of the fragment structure. In most engines, a planner converts SQL into a logical plan, then into a physical plan that describes the operator DAG. Slices of that plan are sent to nodes which then execute the fragments. Druid, however, already has an existing `QueryRunner` based structure. `QueryRunner` are actually "query planners": the `QueryRunner.run()` method is better thought of as `QueryPlanner.plan()`: it figures out what sequence is needed at that point in the pipeline and creates that sequence. Our first step in the path to adopt operators is to reuse the query runners. Instead of creating sequences, we modify `QueryRunner`s to create operators. The fragment-related abstractions in this PR support such an approach. * `FragmentContext`: the state shared by all operators in a fragment. For now, this state includes the `ResponseContext` and, internally, the collection of all operators that form the fragment. * `FragmentBuilder`: creates a fragment from a collection of operators, and provides an API to run the resulting fragment. * `FragmentRun`: runs the fragment, which means calling `open()` on the root operator, returning the root operator's iterator, and closing all operators at the completion of the run. * `FragmentBuilderFactory`: a factory to create a fragment builder. This class will be injected via Guice. We will need a way to pass the `FragmentContext` to `QueryRunner`s so that they can create operators for a fragment. It turns out that `QueryPlus` is handy way to accomplish this, so this PR contains the required `QueryPlus` code. That code isn't used yet: we're just setting things up. ### Configuration This PR also provides a very basic configuration system which reports that the operator-approach is enabled only for scan queries and only if the `-Ddruid.queryng=true` is set on the command line. This is a temporary approach, good enough for testing. Nothing uses that config yet: it will be used in the next PR to allow `QueryRunner`s to know when to use an operator implementation and when to continue to use sequences. ### Tests One of the very handy things about operators is that they are highly modular and thus extremely easy to unit test. Tests exist for all the basic abstractions defined above. ### Next Steps The goal of this PR is for reviewers to focus on the core abstractions. The next PR will begin to create the parallel operator path for scan queries. Those PRs will provide operators converted from the existing sequences, along with the "planner" code that query runners use to define the operator. That whole path an be seen in [this branch](https://github.com/paul-rogers/druid/tree/op-step2). <hr> This PR has: - [X] been self-reviewed. - [X] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
