[GitHub] [druid] paul-rogers commented on issue #12262: Multi-stage distributed queries

GitBox Thu, 14 Jul 2022 10:54:50 -0700


paul-rogers commented on issue #12262:
URL: https://github.com/apache/druid/issues/12262#issuecomment-1184737857


   > To me the key question is: what here can we keep as-is, what can be 
evolved, and what should be implemented fresh?
   
   Here's a crack based on experience with other query engines, and the Druid 
operator prototype:
   
   * Keep the native query engines at the segment level, as explained above.
   * Initially, keep the network implementation. It can be swapped out later.
   * Evolve the `QueryRunner`/`QueryToolBox` code to pull out the planning bits.
   * Evolve the `Sequence` operations (sort, merge, limit, etc.) to be 
operator-based.
   * Evolve the above to be a native query planner and scatter/gather execution 
engine.
   * Evolve the native query planner to generate multi-tier queries, which 
means introducing a node role to accept "fragments".
   
   Once we have the basic distributed structure, evolved from current code, we 
are in position to extend it:
   
   * Compress the `SQL --> logical plan --> native query --> native query 
planner --> operator DAG` structure to be the standard `SQL --> logical plan 
--> physical plan --> operator DAG` structure. Retain the native query path 
noted above.
   * Evolve operators to use frames instead of Java arrays.
   * Evolve exchanges to use advanced networking (async, multiplexed, say) 
instead of REST.
   * Add planner and operator support for joins, windowing, Favorite Feature X, 
etc.
   
   Somewhere around here, we then merge the current ingest-oriented MSQE path 
with the now-operator-based low-latency path:
   
   * MSQE frame processors evolve to be "fragments": a chain of operators, 
using frames, bounded by exchanges.
   * Ingest exchange operators are disk (S3) based rather than network shuffles 
as in low-latency.
   * MSQE and low-latency use the same operators where possible. Operators do 
one thing (limit, say) and are reusable.
   * Merge the MSQE controller workflow engine with the low-latency query 
engine where possible: both need to know about resources, segment locations, 
SELECT pipelines, etc.
   * Low-latency and MSQE runtimes are distinct. Low-latency is memory based, 
running fragments on query nodes. Ingest runs tasks within the Overlord, with 
disk-based shuffles.
   
   In short: on the low latency side, the thought is to evolve what we have to 
put us in a position to incrementally improve those bits we want to improve, 
and to add pipeline features we want to add. That is, we remodel the house 
while living in it: refactor, don't rewrite, wherever possible. Recognize, as 
your note does, that what we have works, but is limited. Refactor the good 
parts, then add the new stuff. Move from one working system to another, never 
take it all apart since we'd find that reaching parity with the existing system 
would be a huge challenge if we try to start over.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on issue #12262: Multi-stage distributed queries

Reply via email to