Re: [PR] docs: update query from deepstorage segment requirement (druid)

via GitHub Mon, 19 Aug 2024 23:21:30 -0700


findingrish commented on code in PR #16842:
URL: https://github.com/apache/druid/pull/16842#discussion_r1715798954



##########
docs/configuration/index.md:
##########
@@ -595,7 +595,9 @@ need arises.
 |`druid.centralizedDatasourceSchema.enabled`|Boolean flag for enabling 
datasource schema building in the Coordinator, this should be specified in the 
common runtime properties.|false|No.|
 |`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This 
config should be set when CentralizedDatasourceSchema feature is enabled. This 
should be specified in the MiddleManager runtime properties.|false|No.|
 
-For, stale schema cleanup configs, refer to properties with the prefix 
`druid.coordinator.kill.segmentSchema` in [Metadata 
Management](#metadata-management).
+If you enable this feature, you can query datasources that are only stored in 
cold storage and are not loaded on a Historical. For more information, see 
[Query from deep storage](../querying/query-from-deep-storage.md).

Review Comment:
   ```suggestion
   If you enable this feature, you can query datasources that are only stored 
in deep storage and are not loaded on a Historical. For more information, see 
[Query from deep storage](../querying/query-from-deep-storage.md).
   ```



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -28,13 +28,20 @@ Druid can query segments that are only stored in deep 
storage. Running a query f
 
 Query from deep storage requires the Multi-stage query (MSQ) task engine. Load 
the extension for it if you don't already have it enabled before you begin. See 
[enable MSQ](../multi-stage-query/index.md#load-the-extension) for more 
information.
 
+To be queryable, your datasource must meet one of the following conditions:
+
+- At least one segment from the datasource is loaded onto a Historical service 
for Druid to plan the query. This segment can be any segment from the 
datasource. You can verify that a datasource has at least one segment on a 
Historical service if it's visible in the Druid console.
+- You have the centralized data source schema feature enabled. For more 
information, see [Centralized datasource 
schema](../configuration/index.md#centralized-datasource-schema).

Review Comment:
   ```suggestion
   - You have the centralized datasource schema feature enabled. For more 
information, see [Centralized datasource 
schema](../configuration/index.md#centralized-datasource-schema).
   ```



##########
docs/tutorials/tutorial-query-deep-storage.md:
##########
@@ -25,7 +25,7 @@ sidebar_label: "Query from deep storage"
 
 Query from deep storage allows you to query segments that are stored only in 
deep storage, which provides lower costs than if you were to load everything 
onto Historical processes. The tradeoff is that queries from deep storage may 
take longer to complete. 
 
-This tutorial walks you through loading example data, configuring load rules 
so that not all the segments get loaded onto Historical processes, and querying 
data from deep storage.
+This tutorial walks you through loading example data, configuring load rules 
so that not all the segments get loaded onto Historical services, and querying 
data from deep storage. If you have [centralized datasource schema 
enabled](../configuration/index.md#centralized-datasource-schema), you can 
query datasources that are only in deep storage and don't need to make sure at 
least one segment is available on a Historical.

Review Comment:
   ```suggestion
   This tutorial walks you through loading example data, configuring load rules 
so that not all the segments get loaded onto Historical services, and querying 
data from deep storage. If you have [centralized datasource schema 
enabled](../configuration/index.md#centralized-datasource-schema), you can 
query datasources that are only in deep storage without having any segment 
available on Historical.
   ```



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -28,13 +28,20 @@ Druid can query segments that are only stored in deep 
storage. Running a query f
 
 Query from deep storage requires the Multi-stage query (MSQ) task engine. Load 
the extension for it if you don't already have it enabled before you begin. See 
[enable MSQ](../multi-stage-query/index.md#load-the-extension) for more 
information.
 
+To be queryable, your datasource must meet one of the following conditions:
+
+- At least one segment from the datasource is loaded onto a Historical service 
for Druid to plan the query. This segment can be any segment from the 
datasource. You can verify that a datasource has at least one segment on a 
Historical service if it's visible in the Druid console.
+- You have the centralized data source schema feature enabled. For more 
information, see [Centralized datasource 
schema](../configuration/index.md#centralized-datasource-schema).
+
+If you use centralized data source schemas, there's an additional step for any 
datasource created prior to enabling it to make the datasource queryable from 
deep storage. You need to load the cold segments onto a Historical so that the 
schema can be backfilled in the metadata database. You can load some or all of 
the segments that are only in deep storage. If you don't load all the segments, 
any dimensions that are only in the segments you didn't load will not be in the 
queryable datasource schema and won't be queryable from deep storage. That is, 
only the dimensions that are in the metadata database and the schema are 
queryable. Once that process is complete, you can unload all the segments from 
the Historical and only keep the data in deep storage.

Review Comment:
   ```suggestion
   If you use centralized data source schema, there's an additional step for 
any datasource created prior to enabling it to make the datasource queryable 
from deep storage. You need to load the segments from deep storage onto a 
Historical so that the schema can be backfilled in the metadata database. You 
can load some or all of the segments that are only in deep storage. If you 
don't load all the segments, any dimensions that are only in the segments you 
didn't load will not be in the queryable datasource schema and won't be 
queryable from deep storage. That is, only the dimensions that are present in 
the segment schema in metadata database are queryable. Once that process is 
complete, you can unload all the segments from the Historical and only keep the 
data in deep storage.
   ```



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -28,13 +28,20 @@ Druid can query segments that are only stored in deep 
storage. Running a query f
 
 Query from deep storage requires the Multi-stage query (MSQ) task engine. Load 
the extension for it if you don't already have it enabled before you begin. See 
[enable MSQ](../multi-stage-query/index.md#load-the-extension) for more 
information.
 
+To be queryable, your datasource must meet one of the following conditions:
+
+- At least one segment from the datasource is loaded onto a Historical service 
for Druid to plan the query. This segment can be any segment from the 
datasource. You can verify that a datasource has at least one segment on a 
Historical service if it's visible in the Druid console.
+- You have the centralized data source schema feature enabled. For more 
information, see [Centralized datasource 
schema](../configuration/index.md#centralized-datasource-schema).
+
+If you use centralized data source schemas, there's an additional step for any 
datasource created prior to enabling it to make the datasource queryable from 
deep storage. You need to load the cold segments onto a Historical so that the 
schema can be backfilled in the metadata database. You can load some or all of 
the segments that are only in deep storage. If you don't load all the segments, 
any dimensions that are only in the segments you didn't load will not be in the 
queryable datasource schema and won't be queryable from deep storage. That is, 
only the dimensions that are in the metadata database and the schema are 
queryable. Once that process is complete, you can unload all the segments from 
the Historical and only keep the data in deep storage.
+
 ## Keep segments in deep storage only
 
-Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. However, to 
take advantage of the cost savings that querying from deep storage provides, 
make sure not all your segments get loaded onto Historical processes.
+Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. However, to 
take advantage of the cost savings that querying from deep storage provides, 
make sure not all your segments get loaded onto Historical processes. If you 
use centralized data source schemas, a datasource can be kept only in deep 
storage but remain queryable.
 
-To do this, configure [load 
rules](../operations/rule-configuration.md#load-rules) to manage the which 
segments are only in deep storage and which get loaded onto Historical 
processes.
+To manage the which segments are kept only in deep storage and which get 
loaded onto Historical processes., configure [load 
rules](../operations/rule-configuration.md#load-rules) 

Review Comment:
   ```suggestion
   To manage which segments are kept only in deep storage and which get loaded 
onto Historical processes, configure [load 
rules](../operations/rule-configuration.md#load-rules) 
   ```



##########
docs/querying/query-from-deep-storage.md:
##########
@@ -28,13 +28,20 @@ Druid can query segments that are only stored in deep 
storage. Running a query f
 
 Query from deep storage requires the Multi-stage query (MSQ) task engine. Load 
the extension for it if you don't already have it enabled before you begin. See 
[enable MSQ](../multi-stage-query/index.md#load-the-extension) for more 
information.
 
+To be queryable, your datasource must meet one of the following conditions:
+
+- At least one segment from the datasource is loaded onto a Historical service 
for Druid to plan the query. This segment can be any segment from the 
datasource. You can verify that a datasource has at least one segment on a 
Historical service if it's visible in the Druid console.
+- You have the centralized data source schema feature enabled. For more 
information, see [Centralized datasource 
schema](../configuration/index.md#centralized-datasource-schema).
+
+If you use centralized data source schemas, there's an additional step for any 
datasource created prior to enabling it to make the datasource queryable from 
deep storage. You need to load the cold segments onto a Historical so that the 
schema can be backfilled in the metadata database. You can load some or all of 
the segments that are only in deep storage. If you don't load all the segments, 
any dimensions that are only in the segments you didn't load will not be in the 
queryable datasource schema and won't be queryable from deep storage. That is, 
only the dimensions that are in the metadata database and the schema are 
queryable. Once that process is complete, you can unload all the segments from 
the Historical and only keep the data in deep storage.
+
 ## Keep segments in deep storage only
 
-Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. However, to 
take advantage of the cost savings that querying from deep storage provides, 
make sure not all your segments get loaded onto Historical processes.
+Any data you ingest into Druid is already stored in deep storage, so you don't 
need to perform any additional configuration from that perspective. However, to 
take advantage of the cost savings that querying from deep storage provides, 
make sure not all your segments get loaded onto Historical processes. If you 
use centralized data source schemas, a datasource can be kept only in deep 
storage but remain queryable.

Review Comment:
   ```suggestion
   Any data you ingest into Druid is already stored in deep storage, so you 
don't need to perform any additional configuration from that perspective. 
However, to take advantage of the cost savings that querying from deep storage 
provides, make sure not all your segments get loaded onto Historical processes. 
If you use centralized datasource schema, a datasource can be kept only in deep 
storage but remain queryable.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: update query from deepstorage segment requirement (druid)

Reply via email to