paul-rogers commented on issue #12262: URL: https://github.com/apache/druid/issues/12262#issuecomment-1184702375
> To me the key question is: what here can we keep as-is, what can be evolved, and what should be implemented fresh? @gianm, thanks for the breakdown of areas! To answer the above question, I'd ask a different one: what are we trying to solve? We only need to change that part. To answer that, let's do a thought experiment. Suppose we are Pinot, and we use Presto to do non-trivial queries. What would that look like? We'd give to Presto enough information that Presto could distribute segment reads to its workers. Each worker would reach out to a historical with a native query. Presto would try to push as much work as possible into the native query so that Presto receives the minimal amount of data from each historical. That requires work in the Presto planer to work out distribution (which historicals have which segments, how to distribute that work to Presto nodes, and what can be pushed down to the historicals.) The Presto leaf operators would send native queries to historicals to do the "scan" work. From there, Presto would implement the rest of the query: grouping, joins, windowing, and so on. Apache Drill (and I think Presto?) have Druid connectors, but the Drill one uses JDBC, which single-threads the reads through the Broker. Would be cool to add the above logic to the Druid connectors for Drill and Presto so we could see how the leading query tools would handle Druid queries. Spark could do the same, but for batch operations. The Druid solution, of course, will do it all, but we can use the same approach. The planner works out the distribution model, and distributes work to historicals. Rather than having the Broker work with historicals directly, we can have a tier of worker nodes do that work, aggregate or merge results, shuffle to other nodes, etc. We'd do that new work if we want to support queries beyond scatter/gather. (Simple queries reduce to scatter/gather: the broker does the work when it is a simple merge.) The key point here is that, even with a DAG, ultimately we want to push as much work as possible onto the historical: filtering, per-segment aggregation, etc. We already have native query code that does that. So, expediency says we just create a physical plan that has leaf nodes that hold the per-historical (or per-segment) native query that we then ship to the historical (using REST or some fancy new networking solution). This means that the part we'd want to change is the distributed portion of the query engine, not the part that runs on data nodes. Since the query runner/sequence structure isn't a good fit for a distributed DAG, we can instead refactor that code into an optimizer and operators, with a distribution framework. Crucially, the data nodes need not change: they don't need to care if they are talking to the "classic" scatter/gather model, Presto, Drill, Spark, or the new multi-tier engine that happens to use an optimizer and operator DAG. The above was validated in the operator prototype mentioned earlier: one variation refactored the Broker to be operator based, and sent native queries to the historicals as a form of pushing operations into the input source, just like we do in Drill. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
