paul-rogers commented on PR #13580: URL: https://github.com/apache/druid/pull/13580#issuecomment-1358151289
I believe you'll find that a push-based model doesn't scale well to larger DAGs, or those with branches. The push model has been tried several times in industry, and it tends to run into the same problems each time. For push to work well, one needs a Storm-like model with buffers between each operator. In such a model, one has a DAG, but each operator is separated from the others via a buffer and each operator runs asynchronously. That kind of model is a natural fit for Go, but not such a good fit for Java. One could recreate one by modernizing the Storm approach. But, even so, one would find it is often overkill: operators within each fragment in a distributed system tend to be simple and adding the complexity of push/async generally is not worth the cost. It is not clear what stack issue you face. In most systems that use operators, a stack trace of the call stack is plenty good to see where a problem occurs. Seeing the call stack from the other direction simply tells you what you already know: the lower part of the DAG. What you actually seem to want is a way to understand the data flow. The call stack is a poor tool for that. Tracing batches works better. As a general editorial comment, this series of PRs seems to be defining an operator model somewhat in isolation from previous work in this area, and separate from considerations at the SQL planner level. A key reason to introduce operators is to allow a uniform planner/optimizer/execution engine structure based on the battle-tested, industry standard operator DAG model. We can certainly innovate in the details (row-based or columnar? Sync or async? Memory management. Etc.) However, it may not be worth our effort to have to re-invent the basics (push vs. pull, relationship between the exec engine and the planner, etc.) The effort that goes into such re-invention would be better spent using the known good model and just cranking out new capabilities. (Here I'm thinking of that odd feature known as the "join.") I would suggest we have a bit more of a discussion about goals and methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
