Re: [PR] docs: query from deep storage (druid)

via GitHub Tue, 25 Jul 2023 09:57:20 -0700


317brian commented on code in PR #14609:
URL: https://github.com/apache/druid/pull/14609#discussion_r1273842217



##########
docs/design/architecture.md:
##########
@@ -70,12 +70,20 @@ Druid uses deep storage to store any data that has been 
ingested into the system
 storage accessible by every Druid server. In a clustered deployment, this is 
typically a distributed object store like S3 or
 HDFS, or a network mounted filesystem. In a single-server deployment, this is 
typically local disk.
 
-Druid uses deep storage only as a backup of your data and as a way to transfer 
data in the background between
-Druid processes. Druid stores data in files called _segments_. Historical 
processes cache data segments on
-local disk and serve queries from that cache as well as from an in-memory 
cache.
-This means that Druid never needs to access deep storage
-during a query, helping it offer the best query latencies possible. It also 
means that you must have enough disk space
-both in deep storage and across your Historical servers for the data you plan 
to load.
+Druid uses deep storage for the following purposes:
+
+- As a backup of your data, including those that get loaded onto Historical 
processes.
+- As a way to transfer data in the background between
+Druid processes. Druid stores data in files called _segments_. 
+- As the source data for queries that run against segments stored only in deep 
storage and not in Historical processes as determined by your load rules.
+
+Historical processes cache data segments on
+local disk and serve queries from that cache as well as from an in-memory 
cache. Segments on disk for Historical processes provide the low latency 
querying performance Druid is known for. You can query directly from deep 
storage though, which allows you to query segments that exist only in deep 
storage. This trades some performance to provide you with the ability to query 
more of your data without necessarily having to scale your Historical processes.
+
+When determining sizing for your storage, keep the following in mind:
+
+- Deep storage needs to be able to hold all the data that you ingest into Druid
+- On disk storage for Historical processes need to be able to accommodate the 
data you want to load onto them to run queries on data you access frequently 
and need low latency for

Review Comment:
   ```suggestion
   - On disk storage for Historical processes need to be able to accommodate 
the data you want to load onto them to run queries. The data on Historical 
processes should be data you access frequently and need to run low latency 
queries for. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: query from deep storage (druid)

Reply via email to