[GitHub] [druid] paul-rogers commented on issue #12262: Multi-stage distributed queries

GitBox Wed, 20 Jul 2022 19:22:34 -0700


paul-rogers commented on issue #12262:
URL: https://github.com/apache/druid/issues/12262#issuecomment-1190963186


   A summary of the notes above might be that the multi-stage engine is a great 
addition. The existing "low-latency" queries are Druid's defining feature. We'd 
like to drag them both, kicking and screaming, into a standard 
optimizer/operator DAG structure so we can exploit the abundant industry 
knowledge about how to optimize queries, and how to implement the Next Favorite 
Feature. The notes above are mostly about how we do this with low latency 
queries.
   
   The question we're discussing is whether the current batch multi-stage 
engine is a good foundation to expand into the low-latency query space. It is 
great, except we have to replace the code that does exchanges (AKA shuffles: 
streaming rather than disk based), task management (in-process rather than 
Overlord based), failure strategy (fail & retry instead of fail & recover), and 
so on. There are similarities, of course, but the details differ. Low latency 
queries are optimized for performance, batch queries for durability. Many of 
the code bits above the basic row operations will differ. (Spark differs from 
Presto, for example.)
   
   So, we need two execution frameworks, but they should share a common 
foundation. To evolve the multi-stage engine toward that common foundation, 
we'd want something like:
   
   * Extend the "frame processor" idea to be a bit more like a "fragment" in 
many engines: `(scan | exchange) --> operator+ --> (scan | client)`.
   * Express the query as a physical plan: a serialized form of a DAG of 
operator descriptions. Use this form rather than the complex set of ad-hoc 
native queries and knick-knack specs inherited from Druid's existing ingest 
engines.
   * Pretty-up the multi-stage query "planner" to emit a physical plan. 
Partition that to get the "work orders" sent to workers.
   * Collapse the Calcite planner, Druid Query Maker, and multi-stage "planner" 
into a single planner. Since the multi-stage engine emits (physical) operators, 
and Calcite manipulates (logical) operators, we can cut out (most of) the 
middlemen.
   * Use Calcite rules, and Druid-specific transforms operating on the logical 
plan to optimize the query rather than the large amount of ad-hoc code 
currently used.
   
   The key notion is that if we plan based on operators, we can leverage the 
vast industry knowledge (of things like relational algebra, how to apply cost 
estimations, etc.) to improve our planning process -- something which is very 
hard to do with ad-hoc data structure such as Druid uses for native queries.
   
   The above sketches paths to converge both low-latency and multi-stage batch 
frameworks toward the industry-standard optimizer/operator-DAG architecture. As 
we do, we can look for points of synergy. For example both should use frames, 
should use the same operator implementations, etc. Moreover, effort invested in 
the planner (metadata, statistics, etc.) benefits both low-latency and batch 
multi-stage queries.
   
   In short, the answer to the question, "how to we use multi-stage for low 
latency" is that both frameworks evolve toward the desired end state of the 
optimizer/operator-DAG architecture. Most other engines are already there, 
we've just got us some catching up to do.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on issue #12262: Multi-stage distributed queries

Reply via email to