findingrish commented on code in PR #16842: URL: https://github.com/apache/druid/pull/16842#discussion_r1715798954
########## docs/configuration/index.md: ########## @@ -595,7 +595,9 @@ need arises. |`druid.centralizedDatasourceSchema.enabled`|Boolean flag for enabling datasource schema building in the Coordinator, this should be specified in the common runtime properties.|false|No.| |`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This config should be set when CentralizedDatasourceSchema feature is enabled. This should be specified in the MiddleManager runtime properties.|false|No.| -For, stale schema cleanup configs, refer to properties with the prefix `druid.coordinator.kill.segmentSchema` in [Metadata Management](#metadata-management). +If you enable this feature, you can query datasources that are only stored in cold storage and are not loaded on a Historical. For more information, see [Query from deep storage](../querying/query-from-deep-storage.md). Review Comment: ```suggestion If you enable this feature, you can query datasources that are only stored in deep storage and are not loaded on a Historical. For more information, see [Query from deep storage](../querying/query-from-deep-storage.md). ``` ########## docs/querying/query-from-deep-storage.md: ########## @@ -28,13 +28,20 @@ Druid can query segments that are only stored in deep storage. Running a query f Query from deep storage requires the Multi-stage query (MSQ) task engine. Load the extension for it if you don't already have it enabled before you begin. See [enable MSQ](../multi-stage-query/index.md#load-the-extension) for more information. +To be queryable, your datasource must meet one of the following conditions: + +- At least one segment from the datasource is loaded onto a Historical service for Druid to plan the query. This segment can be any segment from the datasource. You can verify that a datasource has at least one segment on a Historical service if it's visible in the Druid console. +- You have the centralized data source schema feature enabled. For more information, see [Centralized datasource schema](../configuration/index.md#centralized-datasource-schema). Review Comment: ```suggestion - You have the centralized datasource schema feature enabled. For more information, see [Centralized datasource schema](../configuration/index.md#centralized-datasource-schema). ``` ########## docs/tutorials/tutorial-query-deep-storage.md: ########## @@ -25,7 +25,7 @@ sidebar_label: "Query from deep storage" Query from deep storage allows you to query segments that are stored only in deep storage, which provides lower costs than if you were to load everything onto Historical processes. The tradeoff is that queries from deep storage may take longer to complete. -This tutorial walks you through loading example data, configuring load rules so that not all the segments get loaded onto Historical processes, and querying data from deep storage. +This tutorial walks you through loading example data, configuring load rules so that not all the segments get loaded onto Historical services, and querying data from deep storage. If you have [centralized datasource schema enabled](../configuration/index.md#centralized-datasource-schema), you can query datasources that are only in deep storage and don't need to make sure at least one segment is available on a Historical. Review Comment: ```suggestion This tutorial walks you through loading example data, configuring load rules so that not all the segments get loaded onto Historical services, and querying data from deep storage. If you have [centralized datasource schema enabled](../configuration/index.md#centralized-datasource-schema), you can query datasources that are only in deep storage without having any segment available on Historical. ``` ########## docs/querying/query-from-deep-storage.md: ########## @@ -28,13 +28,20 @@ Druid can query segments that are only stored in deep storage. Running a query f Query from deep storage requires the Multi-stage query (MSQ) task engine. Load the extension for it if you don't already have it enabled before you begin. See [enable MSQ](../multi-stage-query/index.md#load-the-extension) for more information. +To be queryable, your datasource must meet one of the following conditions: + +- At least one segment from the datasource is loaded onto a Historical service for Druid to plan the query. This segment can be any segment from the datasource. You can verify that a datasource has at least one segment on a Historical service if it's visible in the Druid console. +- You have the centralized data source schema feature enabled. For more information, see [Centralized datasource schema](../configuration/index.md#centralized-datasource-schema). + +If you use centralized data source schemas, there's an additional step for any datasource created prior to enabling it to make the datasource queryable from deep storage. You need to load the cold segments onto a Historical so that the schema can be backfilled in the metadata database. You can load some or all of the segments that are only in deep storage. If you don't load all the segments, any dimensions that are only in the segments you didn't load will not be in the queryable datasource schema and won't be queryable from deep storage. That is, only the dimensions that are in the metadata database and the schema are queryable. Once that process is complete, you can unload all the segments from the Historical and only keep the data in deep storage. Review Comment: ```suggestion If you use centralized data source schema, there's an additional step for any datasource created prior to enabling it to make the datasource queryable from deep storage. You need to load the segments from deep storage onto a Historical so that the schema can be backfilled in the metadata database. You can load some or all of the segments that are only in deep storage. If you don't load all the segments, any dimensions that are only in the segments you didn't load will not be in the queryable datasource schema and won't be queryable from deep storage. That is, only the dimensions that are present in the segment schema in metadata database are queryable. Once that process is complete, you can unload all the segments from the Historical and only keep the data in deep storage. ``` ########## docs/querying/query-from-deep-storage.md: ########## @@ -28,13 +28,20 @@ Druid can query segments that are only stored in deep storage. Running a query f Query from deep storage requires the Multi-stage query (MSQ) task engine. Load the extension for it if you don't already have it enabled before you begin. See [enable MSQ](../multi-stage-query/index.md#load-the-extension) for more information. +To be queryable, your datasource must meet one of the following conditions: + +- At least one segment from the datasource is loaded onto a Historical service for Druid to plan the query. This segment can be any segment from the datasource. You can verify that a datasource has at least one segment on a Historical service if it's visible in the Druid console. +- You have the centralized data source schema feature enabled. For more information, see [Centralized datasource schema](../configuration/index.md#centralized-datasource-schema). + +If you use centralized data source schemas, there's an additional step for any datasource created prior to enabling it to make the datasource queryable from deep storage. You need to load the cold segments onto a Historical so that the schema can be backfilled in the metadata database. You can load some or all of the segments that are only in deep storage. If you don't load all the segments, any dimensions that are only in the segments you didn't load will not be in the queryable datasource schema and won't be queryable from deep storage. That is, only the dimensions that are in the metadata database and the schema are queryable. Once that process is complete, you can unload all the segments from the Historical and only keep the data in deep storage. + ## Keep segments in deep storage only -Any data you ingest into Druid is already stored in deep storage, so you don't need to perform any additional configuration from that perspective. However, to take advantage of the cost savings that querying from deep storage provides, make sure not all your segments get loaded onto Historical processes. +Any data you ingest into Druid is already stored in deep storage, so you don't need to perform any additional configuration from that perspective. However, to take advantage of the cost savings that querying from deep storage provides, make sure not all your segments get loaded onto Historical processes. If you use centralized data source schemas, a datasource can be kept only in deep storage but remain queryable. -To do this, configure [load rules](../operations/rule-configuration.md#load-rules) to manage the which segments are only in deep storage and which get loaded onto Historical processes. +To manage the which segments are kept only in deep storage and which get loaded onto Historical processes., configure [load rules](../operations/rule-configuration.md#load-rules) Review Comment: ```suggestion To manage which segments are kept only in deep storage and which get loaded onto Historical processes, configure [load rules](../operations/rule-configuration.md#load-rules) ``` ########## docs/querying/query-from-deep-storage.md: ########## @@ -28,13 +28,20 @@ Druid can query segments that are only stored in deep storage. Running a query f Query from deep storage requires the Multi-stage query (MSQ) task engine. Load the extension for it if you don't already have it enabled before you begin. See [enable MSQ](../multi-stage-query/index.md#load-the-extension) for more information. +To be queryable, your datasource must meet one of the following conditions: + +- At least one segment from the datasource is loaded onto a Historical service for Druid to plan the query. This segment can be any segment from the datasource. You can verify that a datasource has at least one segment on a Historical service if it's visible in the Druid console. +- You have the centralized data source schema feature enabled. For more information, see [Centralized datasource schema](../configuration/index.md#centralized-datasource-schema). + +If you use centralized data source schemas, there's an additional step for any datasource created prior to enabling it to make the datasource queryable from deep storage. You need to load the cold segments onto a Historical so that the schema can be backfilled in the metadata database. You can load some or all of the segments that are only in deep storage. If you don't load all the segments, any dimensions that are only in the segments you didn't load will not be in the queryable datasource schema and won't be queryable from deep storage. That is, only the dimensions that are in the metadata database and the schema are queryable. Once that process is complete, you can unload all the segments from the Historical and only keep the data in deep storage. + ## Keep segments in deep storage only -Any data you ingest into Druid is already stored in deep storage, so you don't need to perform any additional configuration from that perspective. However, to take advantage of the cost savings that querying from deep storage provides, make sure not all your segments get loaded onto Historical processes. +Any data you ingest into Druid is already stored in deep storage, so you don't need to perform any additional configuration from that perspective. However, to take advantage of the cost savings that querying from deep storage provides, make sure not all your segments get loaded onto Historical processes. If you use centralized data source schemas, a datasource can be kept only in deep storage but remain queryable. Review Comment: ```suggestion Any data you ingest into Druid is already stored in deep storage, so you don't need to perform any additional configuration from that perspective. However, to take advantage of the cost savings that querying from deep storage provides, make sure not all your segments get loaded onto Historical processes. If you use centralized datasource schema, a datasource can be kept only in deep storage but remain queryable. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
