[GitHub] [druid] paul-rogers commented on pull request #13580: Push operators

GitBox Mon, 19 Dec 2022 11:29:04 -0800


paul-rogers commented on PR #13580:
URL: https://github.com/apache/druid/pull/13580#issuecomment-1358151289


   I believe you'll find that a push-based model doesn't scale well to larger 
DAGs, or those with branches. The push model has been tried several times in 
industry, and it tends to run into the same problems each time. For push to 
work well, one needs a Storm-like model with buffers between each operator. In 
such a model, one has a DAG, but each operator is separated from the others via 
a buffer and each operator runs asynchronously. That kind of model is a natural 
fit for Go, but not such a good fit for Java. One could recreate one by 
modernizing the Storm approach. But, even so, one would find it is often 
overkill: operators within each fragment in a distributed system tend to be 
simple and adding the complexity of push/async generally is not worth the cost.
   
   It is not clear what stack issue you face. In most systems that use 
operators, a stack trace of the call stack is plenty good to see where a 
problem occurs. Seeing the call stack from the other direction simply tells you 
what you already know: the lower part of the DAG.
   
   What you actually seem to want is a way to understand the data flow. The 
call stack is a poor tool for that. Tracing batches works better.
   
   As a general editorial comment, this series of PRs seems to be defining an 
operator model somewhat in isolation from previous work in this area, and 
separate from considerations at the SQL planner level. A key reason to 
introduce operators is to allow a uniform planner/optimizer/execution engine 
structure based on the battle-tested, industry standard operator DAG model. We 
can certainly innovate in the details (row-based or columnar? Sync or async? 
Memory management. Etc.) However, it may not be worth our effort to have to 
re-invent the basics (push vs. pull, relationship between the exec engine and 
the planner, etc.) The effort that goes into such re-invention would be better 
spent using the known good model and just cranking out new capabilities. (Here 
I'm thinking of that odd feature known as the "join.")
   
   I would suggest we have a bit more of a discussion about goals and methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on pull request #13580: Push operators

Reply via email to