paul-rogers commented on issue #12262: URL: https://github.com/apache/druid/issues/12262#issuecomment-1184737857
> To me the key question is: what here can we keep as-is, what can be evolved, and what should be implemented fresh? Here's a crack based on experience with other query engines, and the Druid operator prototype: * Keep the native query engines at the segment level, as explained above. * Initially, keep the network implementation. It can be swapped out later. * Evolve the `QueryRunner`/`QueryToolBox` code to pull out the planning bits. * Evolve the `Sequence` operations (sort, merge, limit, etc.) to be operator-based. * Evolve the above to be a native query planner and scatter/gather execution engine. * Evolve the native query planner to generate multi-tier queries, which means introducing a node role to accept "fragments". Once we have the basic distributed structure, evolved from current code, we are in position to extend it: * Compress the `SQL --> logical plan --> native query --> native query planner --> operator DAG` structure to be the standard `SQL --> logical plan --> physical plan --> operator DAG` structure. Retain the native query path noted above. * Evolve operators to use frames instead of Java arrays. * Evolve exchanges to use advanced networking (async, multiplexed, say) instead of REST. * Add planner and operator support for joins, windowing, Favorite Feature X, etc. Somewhere around here, we then merge the current ingest-oriented MSQE path with the now-operator-based low-latency path: * MSQE frame processors evolve to be "fragments": a chain of operators, using frames, bounded by exchanges. * Ingest exchange operators are disk (S3) based rather than network shuffles as in low-latency. * MSQE and low-latency use the same operators where possible. Operators do one thing (limit, say) and are reusable. * Merge the MSQE controller workflow engine with the low-latency query engine where possible: both need to know about resources, segment locations, SELECT pipelines, etc. * Low-latency and MSQE runtimes are distinct. Low-latency is memory based, running fragments on query nodes. Ingest runs tasks within the Overlord, with disk-based shuffles. In short: on the low latency side, the thought is to evolve what we have to put us in a position to incrementally improve those bits we want to improve, and to add pipeline features we want to add. That is, we remodel the house while living in it: refactor, don't rewrite, wherever possible. Recognize, as your note does, that what we have works, but is limited. Refactor the good parts, then add the new stuff. Move from one working system to another, never take it all apart since we'd find that reaching parity with the existing system would be a huge challenge if we try to start over. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
