gianm commented on PR #18176: URL: https://github.com/apache/druid/pull/18176#issuecomment-3030382511
> @clintropolis now that segments are being cached on-demand on historicals (not all pre-loaded), and intra-AZ requests are much faster than S3 reads (intra-AZ requests are typically XXXµs, whereas S3 is ~500 ms for single RTT not to mention the numerous RTT needed from downloading massive segments), is this moving towards the goal to make historicals more "stateless" and more of a distributed caching tier? i.e historical A can pull column C from segment D from historical B? Basically improve "hit" rate and spread query load by allowing historicals to pass segment data between each other? Initially our thinking was to focus on partial fetches from S3, so we only have to fetch the columns actually needed for a query. From my PoV (@clintropolis may have his own opinion) from there it would make sense to have the Druid servers able to fetch from each other. It would make it more feasible to burst up compute for a short period of time (~seconds even would make sense). Btw, I would say Historicals are already "stateless" in that they are just a cache of immutable segments (backed by S3) + some compute. The initial work we're doing with virtual storage is enabling the cache to be populated on demand from S3 during a query, instead of needing to be populated before the query starts. This improves things in two main ways: - Historicals are immediately usable after being booted up; don't need to populate their cache before they usable for queries - you can have more data in S3 than available disk on Historicals, if you like Later on, fetching data from each other would be a way to improve the latency of cache population, which will have the effect of making the servers more responsive immediately following a cold start. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
