Re: [PR] experimental virtual storage fabric mode for historicals (druid)

via GitHub Wed, 02 Jul 2025 19:59:54 -0700


gianm commented on PR #18176:
URL: https://github.com/apache/druid/pull/18176#issuecomment-3030382511


   > @clintropolis now that segments are being cached on-demand on historicals 
(not all pre-loaded), and intra-AZ requests are much faster than S3 reads 
(intra-AZ requests are typically XXXµs, whereas S3 is ~500 ms for single RTT 
not to mention the numerous RTT needed from downloading massive segments), is 
this moving towards the goal to make historicals more "stateless" and more of a 
distributed caching tier? i.e historical A can pull column C from segment D 
from historical B? Basically improve "hit" rate and spread query load by 
allowing historicals to pass segment data between each other?
   
   Initially our thinking was to focus on partial fetches from S3, so we only 
have to fetch the columns actually needed for a query. From my PoV 
(@clintropolis may have his own opinion) from there it would make sense to have 
the Druid servers able to fetch from each other. It would make it more feasible 
to burst up compute for a short period of time (~seconds even would make sense).
   
   Btw, I would say Historicals are already "stateless" in that they are just a 
cache of immutable segments (backed by S3) + some compute. The initial work 
we're doing with virtual storage is enabling the cache to be populated on 
demand from S3 during a query, instead of needing to be populated before the 
query starts. This improves things in two main ways:
   
   - Historicals are immediately usable after being booted up; don't need to 
populate their cache before they usable for queries
   - you can have more data in S3 than available disk on Historicals, if you 
like
   
   Later on, fetching data from each other would be a way to improve the 
latency of cache population, which will have the effect of making the servers 
more responsive immediately following a cold start.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] experimental virtual storage fabric mode for historicals (druid)

Reply via email to