samarthjain commented on issue #12262: URL: https://github.com/apache/druid/issues/12262#issuecomment-1054567811
Thank you for the proposal, Gian! Building a DAG based query execution model seems like the next logical thing to do in Druid. I am excited to see progress on that front. Below are some adhoc comments: 1) Build for resilience - as queries get more and more complex, chances of them running into failures because of network blips or bad hardware go up. As we are building these new capabilities, we should think about building resilience including capabilities to restart stages, exponential backoffs in case of network partitions, speculative execution of a certain percentage of tasks, etc. 2) Resource fairness - currently Druid has limited support for ensuring resource fairness outside of query laning. As queries supported by Druid get more complex, they will push boundaries on memory, disk, cpu and network. Ensuring fair resource usage to avoid queries from starving each other and affecting overall system stability will be critical, IMHO. 3) Scaling and decoupling of shuffle servers - considering shuffle servers will be mostly stateless other than storing intermediate query state, to me it makes sense to have them as independent servers not serving any other Druid functionality. This would make it easier to scale them up. 4) A UI to show the query DAG and plan would be good to have. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
