[GitHub] [druid] paul-rogers commented on issue #12262: Multi-stage distributed queries

GitBox Wed, 13 Jul 2022 10:51:48 -0700


paul-rogers commented on issue #12262:
URL: https://github.com/apache/druid/issues/12262#issuecomment-1183512847


   @clintropolis asked:
   
   > What is the migration path to something like that? Build the new thing 
side-by-side I guess? 
   
   For the optimizer/operator prototype, the approach taken was to evolve the 
current code. First, replace sequences with operators in a way that the two 
interoperate. Then, we'd find that query runners do nothing other than call a 
"planner function" that produces an operator, so we can gather up that code 
into a low-level "native query planner" that is similar to the controller work 
manager @gianm created.
   
   Once we've converted the query runner/sequence stack to native query 
planner/operators, we can, if we're confident, just ship that. (Operators are 
designed to be easy to test, unlike query runners and sequences, so we can more 
easily ensure quality.) Or, if we are cautions, we offer both. The 
`QueryLifecycle` either goes down the current path, or the new path, returning 
results the same way regardless of engine used.
   
   From there, we can extend the planner for more use cases: multi-tier 
distribution, distributed joins, etc. We can also combine planner levels. 
Currently we have Calcite, the native query maker and the query runners. With 
MSQ, we also have the controller's work allocation code. After the above step 
we have Calcite, the native query maker and the native query planner in one 
path, Calcite, the MSQ query maker, and the Controller work allocator in the 
ingest path. We can begin to merge the planners so we plan the SQL query, not 
the native query. That removes barriers to what we can offer uses. "Classic" 
queries are still fast, but BI queries are possible without overloading the 
Broker. Calcite knows about windowed aggregations, we must add a windowing 
operator and we're (mostly) good to go.
   
   Over time, we could, if there was reason, end up with the common solution: 
Calcite for query parsing, analysis and query optimization. A Druid-specific 
layer to convert Calcite rels into the equivalent Druid operators. Finally, 
execution frameworks optimized for low-latency (in-memory processing) or ingest 
(disk-based processing). The operators are the same, only the exchanges differ 
on the data pipeline path. (The execution management, resource management, 
error handling and other aspects would be different, but the data pipeline is 
mostly blissfully ignorant of these aspects.) 
   
   While rearranging the planner, we should be able to move items one at a 
time: they become operators to Calcite, with the remaining native query planner 
as remaining as the big, complex operator we have today in the `DruidQueryRel` 
where a native query is a Calcite relational operator.
   
   We could also merge the MSQ efforts. Once sequences are replaced with 
operators, the operators can change to use frames (rather than Java arrays) as 
their storage format. Since operators are very flexible, during the transition 
we can have some operators use frames while others continue to use arrays. 
"Shim" operators handle translation, the same way that, in the 
sequence-to-operator transition, shims handle the sequence-to-operator and 
operator-to-sequence conversions. We'd want to remove the shims in the end, but 
they do allow an incremental approach to development.
   
   Similarly, if we want to change how we do shuffles, we can more easily do so 
with operators. In an operator DAG, an "exchange operator" is physically 
comprised of two parts: a sender and a receiver. The query stack doesn't really 
care how these work, as long as the planner inserts matched pairs. We can start 
with the existing HTTP scatter/gather. Then, we add a new exchange operator 
based on frames, or async frames, or a connection-based solution, or 
multiplexed, connection-based frames. Or, for batch MSQ, based on external 
files. The "Lego" nature of operators means that the rest of the DAG doesn't 
care as long as the sender consumes rows from "downstream" and the receiver 
provides them "upstream".
   
   Experience with other tools has shown that, with the operator-based 
approached, each operator can be evolved and tested in isolation, allowing us 
to safely incorporate MSQ functionality step-by-step.
   
   Similarly, as long as two planner versions produce the same plans for the 
same queries, we can be sure that we've not broken anything. If version X is 
supposed to add new features, we can easily verify that only the target plans 
changed, and everything else remains the same. We don't need to actually 
execute the queries to verify the planner: we just compare "physical plans". 
There was a PR that started down this path: #12545, though it was closed for 
now to focus efforts elsewhere. This PR was based on a technique which both 
Drill and Impala used with great success.
   
   The point is, others have worked out solutions for safely evolving an 
optimizer/operator-based query engine. We can shamelessly borrow those ideas 
where they are useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on issue #12262: Multi-stage distributed queries

Reply via email to