clintropolis commented on issue #12262: URL: https://github.com/apache/druid/issues/12262#issuecomment-1183027800
> * No, or reduced, need for server thread pool size and connection pool size tuning. Today the server thread pool sizes on Brokers and data servers, and the Broker-to-data-server connection pool size, dictate the number of queries that can run concurrently. Each query needs to acquire a connection from the Broker-to-data-server pool, and a server thread from the Broker and all relevant data servers, in order to run. High QPS workloads require tuning these three parameters. Too low and the hardware isn't maxed out; too high and the system can suffer from excessive memory use or context switching. It'd be better to arrive at good throughput and stability without needing to adjust these parameters. > * In the out-of-box configuration, query priorities apply only to the processing thread pools. They don't apply to the resources mentioned in the prior bullet. So, it's possible for low priority long-running queries to starve out high priority queries. Query laning helps with this, but it isn't configured out of the box, and the 429 error codes make it more difficult to write clients since retries are necessary. I think we can come up with a better solution. I suspect we'll want to decouple server-to-server queries from the http request/response structure. I think these are both artifacts of not using async http handling so that we could manage the queue of query requests ourselves? I believe async http request handling for queries would be actually required as long as we want to maintain the current HTTP request API model for interactive queries. Async http handling was a longer term follow-up goal from the laning stuff that I sort of forgot about and never got back to 😅 , but is something I had in mind - the 429 would be replaced instead with feeding query requests into a lane specific queue to wait for a processing slot and lanes themselves prioritized. Likewise, i imagine the broker -> historical pools could be growable if we didn't also need to use them to control overall load, since the historicals would be controlling their own load and open requests wouldn't be blocking anything anywhere. > * Query types today are monolithic and handle many functions internally (like aggregation, ordering, limiting) in a fixed structure. Splitting these up into smaller logical operators would simplify planning and make execution more flexible. It will also allow us to factor the execution code into more modular units. Some discussion in https://github.com/apache/druid/pull/12641. What is the migration path to something like that? Build the new thing side-by-side I guess? There is a fair bit of discussion in the proposal and some indication of doing it in-place, but unless I missed it, details on how it might actually be done for the more optimized/complicated engines incrementally are a bit light, and it doesn't really seem settled in the comments. A lot of the discussion and the prototype are in terms of the scan query, which is a bit too simple to prove anything imo. Given the heavy differences with how stuff currently works, the only non-dangerous way it seems to do this is the side-by-side rebuilding of like everything above segment and segment-like reading stuff (selectors/filters/indexes/cursors/etc), query type at a time? Otherwise the amount of proving necessary to ensure for example that branches caused by feature flags in potentially performance critical paths aren't impacting the existing query processing feels like it would be too much. Maybe i'm being too cautious though? All that said, it sounds nice if we can work it out to be the same or better performance than the existing engines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
