317brian commented on code in PR #14609: URL: https://github.com/apache/druid/pull/14609#discussion_r1273842217
########## docs/design/architecture.md: ########## @@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been ingested into the system storage accessible by every Druid server. In a clustered deployment, this is typically a distributed object store like S3 or HDFS, or a network mounted filesystem. In a single-server deployment, this is typically local disk. -Druid uses deep storage only as a backup of your data and as a way to transfer data in the background between -Druid processes. Druid stores data in files called _segments_. Historical processes cache data segments on -local disk and serve queries from that cache as well as from an in-memory cache. -This means that Druid never needs to access deep storage -during a query, helping it offer the best query latencies possible. It also means that you must have enough disk space -both in deep storage and across your Historical servers for the data you plan to load. +Druid uses deep storage for the following purposes: + +- As a backup of your data, including those that get loaded onto Historical processes. +- As a way to transfer data in the background between +Druid processes. Druid stores data in files called _segments_. +- As the source data for queries that run against segments stored only in deep storage and not in Historical processes as determined by your load rules. + +Historical processes cache data segments on +local disk and serve queries from that cache as well as from an in-memory cache. Segments on disk for Historical processes provide the low latency querying performance Druid is known for. You can query directly from deep storage though, which allows you to query segments that exist only in deep storage. This trades some performance to provide you with the ability to query more of your data without necessarily having to scale your Historical processes. + +When determining sizing for your storage, keep the following in mind: + +- Deep storage needs to be able to hold all the data that you ingest into Druid +- On disk storage for Historical processes need to be able to accommodate the data you want to load onto them to run queries on data you access frequently and need low latency for Review Comment: ```suggestion - On disk storage for Historical processes need to be able to accommodate the data you want to load onto them to run queries. The data on Historical processes should be data you access frequently and need to run low latency queries for. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
