vtlim commented on code in PR #16585: URL: https://github.com/apache/druid/pull/16585#discussion_r1635465857
########## docs/configuration/index.md: ########## @@ -670,14 +670,9 @@ Store task logs in S3. Note that the `druid-s3-extensions` extension must be loa ##### Azure Blob Store task logs -Store task logs in Azure Blob Store. +Store task logs in Azure Blob Store. Note that the `druid-azure-extensions` extension must be loaded, and that the same storage account (and authentication method) as the deep storage module is used (`druid.azure.account`). Review Comment: ```suggestion Store task logs in Azure Blob Store. To enable this feature, load the `druid-azure-extensions` extension, and configure deep storage for Azure. Druid uses the same authentication method configured for deep storage and stores task logs in the same storage account (set in `druid.azure.account`). ``` ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. Review Comment: ```suggestion * [Persist task logs](#persist-task-logs-in-azure) to Azure Blob Storage for long-term storage. ``` ########## docs/configuration/index.md: ########## @@ -670,14 +670,9 @@ Store task logs in S3. Note that the `druid-s3-extensions` extension must be loa ##### Azure Blob Store task logs -Store task logs in Azure Blob Store. +Store task logs in Azure Blob Store. Note that the `druid-azure-extensions` extension must be loaded, and that the same storage account (and authentication method) as the deep storage module is used (`druid.azure.account`). -Note: The `druid-azure-extensions` extension must be loaded, and this uses the same storage account as the deep storage module for azure. - -|Property|Description|Default| Review Comment: I like the idea of consolidating information in one place, but given that this section has properties for S3, GCS, etc., I wonder if we should leave in the Azure table for consistency ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. + +:::info To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list. -## Deep Storage - -[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration. - -|Property|Description|Possible Values|Default| -|--------|---------------|-----------|-------| -|`druid.storage.type`|azure||Must be set.| -|`druid.azure.account`||Azure Storage account name.|Must be set.| -|`druid.azure.key`||Azure Storage account key.|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.| -|`druid.azure.sharedAccessStorageToken`||Azure Shared Storage access token|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain..| -|`druid.azure.useAzureCredentialsChain`|Use [DefaultAzureCredential](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable) for authentication|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.|False| -|`druid.azure.managedIdentityClientId`|If you want to use managed identity authentication in the `DefaultAzureCredential`, `useAzureCredentialsChain` must be true.||Optional.| -|`druid.azure.container`||Azure Storage container name.|Must be set.| -|`druid.azure.prefix`|A prefix string that will be prepended to the blob names for the segments published to Azure deep storage| |""| -|`druid.azure.protocol`|the protocol to use|http or https|https| -|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3| -|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024| -|`druid.azure.storageAccountEndpointSuffix`| The endpoint suffix to use. Use this config instead of `druid.azure.endpointSuffix`. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api). This config supports storage accounts enabled for [AzureDNSZone](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal). Note: do not include the storage account name prefix in this config value. | Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`| `blob.core.windows.net`| -See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information. +::: + +### Ingest data from Azure + +Ingest data using either [MSQ](../../multi-stage-query/index.md) or a native batch [parallel task](../../ingestion/native-batch.md) with an [Azure input source](../../ingestion/input-sources.md#azure-input-source) (`azureStorage`) to read objects directly from Azure Blob Storage. + +### Store segments in Azure + +:::info + +To enable Azure for deep storage, explicitly enable it by setting `druid.storage.type=azure`. + +::: + +#### Configuration - Location + +Use the following configuration to setup the where to store segments: + +| Property | Description | Default | +|---|---|---| +| `druid.azure.account` | The Azure Storage account name. | Must be set. | +| `druid.azure.container` | The Azure Storage container name. | Must be set. | +| `druid.azure.prefix` | A prefix string that will be prepended to the blob names for the segments published. | "" | +| `druid.azure.maxTries` | Number of tries before canceling an Azure operation. | 3 | +| `druid.azure.protocol` | The protocol to use to connect to the Azure Storage account. Either `http` or `https`. | `https` | +| `druid.azure.storageAccountEndpointSuffix` | The Storage account endpoint to use. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api) or storage accounts with [Azure DNS zone endpoints](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview#azure-dns-zone-endpoints-preview).<br/><br/>Do _not_ include the storage account name prefix in this config value.<br/><br/>Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`. | `blob.core.windows.net` | + +#### Configuration - Authentication + +The Azure extension currently supports authenticating with either an [SAS Token](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview), a [Shared Key](https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key), or by using the default Azure credentials chain ([`DefaultAzureCredential`](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme#defaultazurecredential)). Use the following configuration to use either of these options: + +:::info + +One authentication needs to be provided. Set one of `sharedAccessStorageToken`, `key` or `useAzureCredentialsChain`. + +::: Review Comment: ```suggestion Authenticate access to Azure Blob Storage using one of the following methods: * [SAS token](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) * [Shared Key](https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key) * Default Azure credentials chain ([`DefaultAzureCredential`](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme#defaultazurecredential)). Configure authentication using the following properties: ::: ``` ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. + +:::info To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list. -## Deep Storage - -[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration. - -|Property|Description|Possible Values|Default| -|--------|---------------|-----------|-------| -|`druid.storage.type`|azure||Must be set.| -|`druid.azure.account`||Azure Storage account name.|Must be set.| -|`druid.azure.key`||Azure Storage account key.|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.| -|`druid.azure.sharedAccessStorageToken`||Azure Shared Storage access token|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain..| -|`druid.azure.useAzureCredentialsChain`|Use [DefaultAzureCredential](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable) for authentication|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.|False| -|`druid.azure.managedIdentityClientId`|If you want to use managed identity authentication in the `DefaultAzureCredential`, `useAzureCredentialsChain` must be true.||Optional.| -|`druid.azure.container`||Azure Storage container name.|Must be set.| -|`druid.azure.prefix`|A prefix string that will be prepended to the blob names for the segments published to Azure deep storage| |""| -|`druid.azure.protocol`|the protocol to use|http or https|https| -|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3| -|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024| -|`druid.azure.storageAccountEndpointSuffix`| The endpoint suffix to use. Use this config instead of `druid.azure.endpointSuffix`. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api). This config supports storage accounts enabled for [AzureDNSZone](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal). Note: do not include the storage account name prefix in this config value. | Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`| `blob.core.windows.net`| -See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information. +::: + +### Ingest data from Azure + +Ingest data using either [MSQ](../../multi-stage-query/index.md) or a native batch [parallel task](../../ingestion/native-batch.md) with an [Azure input source](../../ingestion/input-sources.md#azure-input-source) (`azureStorage`) to read objects directly from Azure Blob Storage. + +### Store segments in Azure + +:::info + +To enable Azure for deep storage, explicitly enable it by setting `druid.storage.type=azure`. + +::: + +#### Configuration - Location + +Use the following configuration to setup the where to store segments: + +| Property | Description | Default | +|---|---|---| +| `druid.azure.account` | The Azure Storage account name. | Must be set. | +| `druid.azure.container` | The Azure Storage container name. | Must be set. | +| `druid.azure.prefix` | A prefix string that will be prepended to the blob names for the segments published. | "" | +| `druid.azure.maxTries` | Number of tries before canceling an Azure operation. | 3 | +| `druid.azure.protocol` | The protocol to use to connect to the Azure Storage account. Either `http` or `https`. | `https` | +| `druid.azure.storageAccountEndpointSuffix` | The Storage account endpoint to use. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api) or storage accounts with [Azure DNS zone endpoints](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview#azure-dns-zone-endpoints-preview).<br/><br/>Do _not_ include the storage account name prefix in this config value.<br/><br/>Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`. | `blob.core.windows.net` | + +#### Configuration - Authentication + +The Azure extension currently supports authenticating with either an [SAS Token](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview), a [Shared Key](https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key), or by using the default Azure credentials chain ([`DefaultAzureCredential`](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme#defaultazurecredential)). Use the following configuration to use either of these options: + +:::info + +One authentication needs to be provided. Set one of `sharedAccessStorageToken`, `key` or `useAzureCredentialsChain`. + +::: + +| Property | Description | Default | +|---|---|---| +| `druid.azure.sharedAccessStorageToken` | The SAS (Shared Storage Access) token. | | +| `druid.azure.key` | The Shared Key. | | +| `druid.azure.useAzureCredentialsChain` | If `true`, use `DefaultAzureCredential` for authentication. | `false` | +| `druid.azure.managedIdentityClientId` | To use managed identity authentication in the `DefaultAzureCredential`, set `useAzureCredentialsChain` to `true` and provide the client ID here. | | + +### Persist task logs in Azure + +:::info + +To enable Azure for persisting task logs, explicitly enable it by setting `druid.indexer.logs.type=azure`. + +::: + +Task logs are persisted using the account and authentication method configured for storing segments. Use the following configuration to setup where to store the task logs: + +| Property | Description | Default | +|---|---|---| +| `druid.indexer.logs.container` | The Azure Blob Store container to write logs to. | Must be set. | +| `druid.indexer.logs.prefix` | The path to prepend to logs. | Must be set. | + +For configuration options regarding task retention, see the generic options [here](../../configuration/index.md#log-retention-policy). Review Comment: ```suggestion For general options regarding task retention, see [Log retention policy](../../configuration/index.md#log-retention-policy). ``` ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. + +:::info To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list. -## Deep Storage - -[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration. - -|Property|Description|Possible Values|Default| -|--------|---------------|-----------|-------| -|`druid.storage.type`|azure||Must be set.| -|`druid.azure.account`||Azure Storage account name.|Must be set.| -|`druid.azure.key`||Azure Storage account key.|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.| -|`druid.azure.sharedAccessStorageToken`||Azure Shared Storage access token|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain..| -|`druid.azure.useAzureCredentialsChain`|Use [DefaultAzureCredential](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable) for authentication|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.|False| -|`druid.azure.managedIdentityClientId`|If you want to use managed identity authentication in the `DefaultAzureCredential`, `useAzureCredentialsChain` must be true.||Optional.| -|`druid.azure.container`||Azure Storage container name.|Must be set.| -|`druid.azure.prefix`|A prefix string that will be prepended to the blob names for the segments published to Azure deep storage| |""| -|`druid.azure.protocol`|the protocol to use|http or https|https| -|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3| -|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024| -|`druid.azure.storageAccountEndpointSuffix`| The endpoint suffix to use. Use this config instead of `druid.azure.endpointSuffix`. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api). This config supports storage accounts enabled for [AzureDNSZone](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal). Note: do not include the storage account name prefix in this config value. | Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`| `blob.core.windows.net`| -See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information. +::: + +### Ingest data from Azure + +Ingest data using either [MSQ](../../multi-stage-query/index.md) or a native batch [parallel task](../../ingestion/native-batch.md) with an [Azure input source](../../ingestion/input-sources.md#azure-input-source) (`azureStorage`) to read objects directly from Azure Blob Storage. + +### Store segments in Azure + +:::info + +To enable Azure for deep storage, explicitly enable it by setting `druid.storage.type=azure`. + +::: + +#### Configuration - Location + +Use the following configuration to setup the where to store segments: + +| Property | Description | Default | +|---|---|---| +| `druid.azure.account` | The Azure Storage account name. | Must be set. | +| `druid.azure.container` | The Azure Storage container name. | Must be set. | +| `druid.azure.prefix` | A prefix string that will be prepended to the blob names for the segments published. | "" | +| `druid.azure.maxTries` | Number of tries before canceling an Azure operation. | 3 | +| `druid.azure.protocol` | The protocol to use to connect to the Azure Storage account. Either `http` or `https`. | `https` | +| `druid.azure.storageAccountEndpointSuffix` | The Storage account endpoint to use. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api) or storage accounts with [Azure DNS zone endpoints](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview#azure-dns-zone-endpoints-preview).<br/><br/>Do _not_ include the storage account name prefix in this config value.<br/><br/>Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`. | `blob.core.windows.net` | + +#### Configuration - Authentication + +The Azure extension currently supports authenticating with either an [SAS Token](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview), a [Shared Key](https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key), or by using the default Azure credentials chain ([`DefaultAzureCredential`](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme#defaultazurecredential)). Use the following configuration to use either of these options: + +:::info + +One authentication needs to be provided. Set one of `sharedAccessStorageToken`, `key` or `useAzureCredentialsChain`. + +::: + +| Property | Description | Default | +|---|---|---| +| `druid.azure.sharedAccessStorageToken` | The SAS (Shared Storage Access) token. | | +| `druid.azure.key` | The Shared Key. | | +| `druid.azure.useAzureCredentialsChain` | If `true`, use `DefaultAzureCredential` for authentication. | `false` | +| `druid.azure.managedIdentityClientId` | To use managed identity authentication in the `DefaultAzureCredential`, set `useAzureCredentialsChain` to `true` and provide the client ID here. | | + +### Persist task logs in Azure + +:::info + +To enable Azure for persisting task logs, explicitly enable it by setting `druid.indexer.logs.type=azure`. + +::: + +Task logs are persisted using the account and authentication method configured for storing segments. Use the following configuration to setup where to store the task logs: Review Comment: ```suggestion Druid stores task logs using the storage account and authentication method configured for storing segments. Use the following configuration to set up where to store the task logs: ``` ########## docs/ingestion/input-sources.md: ########## @@ -581,9 +584,11 @@ in `druid.ingestion.hdfs.allowedProtocols`. See [HDFS input source security conf The HTTP input source is to support reading files directly from remote sites via HTTP. :::info Review Comment: ```suggestion :::info Security notes ``` This becomes the [title](https://docusaurus.io/docs/2.x/markdown-features/admonitions#specifying-title) of the admonition ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: Review Comment: ```suggestion This extension allows you to do the following: ``` We try not to use numbers in these cases to make the text less brittle ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. + +:::info To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list. -## Deep Storage - -[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration. - -|Property|Description|Possible Values|Default| -|--------|---------------|-----------|-------| -|`druid.storage.type`|azure||Must be set.| -|`druid.azure.account`||Azure Storage account name.|Must be set.| -|`druid.azure.key`||Azure Storage account key.|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.| -|`druid.azure.sharedAccessStorageToken`||Azure Shared Storage access token|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain..| -|`druid.azure.useAzureCredentialsChain`|Use [DefaultAzureCredential](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable) for authentication|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.|False| -|`druid.azure.managedIdentityClientId`|If you want to use managed identity authentication in the `DefaultAzureCredential`, `useAzureCredentialsChain` must be true.||Optional.| -|`druid.azure.container`||Azure Storage container name.|Must be set.| -|`druid.azure.prefix`|A prefix string that will be prepended to the blob names for the segments published to Azure deep storage| |""| -|`druid.azure.protocol`|the protocol to use|http or https|https| -|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3| -|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024| -|`druid.azure.storageAccountEndpointSuffix`| The endpoint suffix to use. Use this config instead of `druid.azure.endpointSuffix`. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api). This config supports storage accounts enabled for [AzureDNSZone](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal). Note: do not include the storage account name prefix in this config value. | Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`| `blob.core.windows.net`| -See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information. +::: + +### Ingest data from Azure + +Ingest data using either [MSQ](../../multi-stage-query/index.md) or a native batch [parallel task](../../ingestion/native-batch.md) with an [Azure input source](../../ingestion/input-sources.md#azure-input-source) (`azureStorage`) to read objects directly from Azure Blob Storage. + +### Store segments in Azure + +:::info + +To enable Azure for deep storage, explicitly enable it by setting `druid.storage.type=azure`. + +::: + +#### Configuration - Location Review Comment: ```suggestion #### Configure location ``` ########## docs/ingestion/input-sources.md: ########## @@ -581,9 +584,11 @@ in `druid.ingestion.hdfs.allowedProtocols`. See [HDFS input source security conf The HTTP input source is to support reading files directly from remote sites via HTTP. :::info - **Security notes:** Ingestion tasks run under the operating system account that runs the Druid processes, for example the Indexer, Middle Manager, and Peon. This means any user who can submit an ingestion task can specify an input source referring to any location that the Druid process can access. For example, using `http` input source, users may have access to internal network servers. - The `http` input source is not limited to the HTTP or HTTPS protocols. It uses the Java URI class that supports HTTP, HTTPS, FTP, file, and jar protocols by default. +**Security notes:** Ingestion tasks run under the operating system account that runs the Druid processes, for example the Indexer, Middle Manager, and Peon. This means any user who can submit an ingestion task can specify an input source referring to any location that the Druid process can access. For example, using `http` input source, users may have access to internal network servers. Review Comment: ```suggestion Ingestion tasks run under the operating system account that runs the Druid processes, for example the Indexer, Middle Manager, and Peon. This means any user who can submit an ingestion task can specify an input source referring to any location that the Druid process can access. For example, using `http` input source, users may have access to internal network servers. ``` ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. + +:::info To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list. -## Deep Storage - -[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration. - -|Property|Description|Possible Values|Default| -|--------|---------------|-----------|-------| -|`druid.storage.type`|azure||Must be set.| -|`druid.azure.account`||Azure Storage account name.|Must be set.| -|`druid.azure.key`||Azure Storage account key.|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.| -|`druid.azure.sharedAccessStorageToken`||Azure Shared Storage access token|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain..| -|`druid.azure.useAzureCredentialsChain`|Use [DefaultAzureCredential](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable) for authentication|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.|False| -|`druid.azure.managedIdentityClientId`|If you want to use managed identity authentication in the `DefaultAzureCredential`, `useAzureCredentialsChain` must be true.||Optional.| -|`druid.azure.container`||Azure Storage container name.|Must be set.| -|`druid.azure.prefix`|A prefix string that will be prepended to the blob names for the segments published to Azure deep storage| |""| -|`druid.azure.protocol`|the protocol to use|http or https|https| -|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3| -|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024| -|`druid.azure.storageAccountEndpointSuffix`| The endpoint suffix to use. Use this config instead of `druid.azure.endpointSuffix`. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api). This config supports storage accounts enabled for [AzureDNSZone](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal). Note: do not include the storage account name prefix in this config value. | Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`| `blob.core.windows.net`| -See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information. +::: + +### Ingest data from Azure + +Ingest data using either [MSQ](../../multi-stage-query/index.md) or a native batch [parallel task](../../ingestion/native-batch.md) with an [Azure input source](../../ingestion/input-sources.md#azure-input-source) (`azureStorage`) to read objects directly from Azure Blob Storage. + +### Store segments in Azure + +:::info + +To enable Azure for deep storage, explicitly enable it by setting `druid.storage.type=azure`. + +::: + +#### Configuration - Location + +Use the following configuration to setup the where to store segments: Review Comment: ```suggestion Configure where to store segments using the following properties: ``` ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. + +:::info To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list. -## Deep Storage - -[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration. - -|Property|Description|Possible Values|Default| -|--------|---------------|-----------|-------| -|`druid.storage.type`|azure||Must be set.| -|`druid.azure.account`||Azure Storage account name.|Must be set.| -|`druid.azure.key`||Azure Storage account key.|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.| -|`druid.azure.sharedAccessStorageToken`||Azure Shared Storage access token|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain..| -|`druid.azure.useAzureCredentialsChain`|Use [DefaultAzureCredential](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable) for authentication|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.|False| -|`druid.azure.managedIdentityClientId`|If you want to use managed identity authentication in the `DefaultAzureCredential`, `useAzureCredentialsChain` must be true.||Optional.| -|`druid.azure.container`||Azure Storage container name.|Must be set.| -|`druid.azure.prefix`|A prefix string that will be prepended to the blob names for the segments published to Azure deep storage| |""| -|`druid.azure.protocol`|the protocol to use|http or https|https| -|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3| -|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024| -|`druid.azure.storageAccountEndpointSuffix`| The endpoint suffix to use. Use this config instead of `druid.azure.endpointSuffix`. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api). This config supports storage accounts enabled for [AzureDNSZone](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal). Note: do not include the storage account name prefix in this config value. | Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`| `blob.core.windows.net`| -See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information. +::: + +### Ingest data from Azure + +Ingest data using either [MSQ](../../multi-stage-query/index.md) or a native batch [parallel task](../../ingestion/native-batch.md) with an [Azure input source](../../ingestion/input-sources.md#azure-input-source) (`azureStorage`) to read objects directly from Azure Blob Storage. + +### Store segments in Azure + +:::info + +To enable Azure for deep storage, explicitly enable it by setting `druid.storage.type=azure`. Review Comment: ```suggestion To use Azure for deep storage, set `druid.storage.type=azure`. ``` "To enable" almost makes it sound like you can enable multiple deep storage options ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. + +:::info To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list. -## Deep Storage - -[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration. - -|Property|Description|Possible Values|Default| -|--------|---------------|-----------|-------| -|`druid.storage.type`|azure||Must be set.| -|`druid.azure.account`||Azure Storage account name.|Must be set.| -|`druid.azure.key`||Azure Storage account key.|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.| -|`druid.azure.sharedAccessStorageToken`||Azure Shared Storage access token|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain..| -|`druid.azure.useAzureCredentialsChain`|Use [DefaultAzureCredential](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable) for authentication|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.|False| -|`druid.azure.managedIdentityClientId`|If you want to use managed identity authentication in the `DefaultAzureCredential`, `useAzureCredentialsChain` must be true.||Optional.| -|`druid.azure.container`||Azure Storage container name.|Must be set.| -|`druid.azure.prefix`|A prefix string that will be prepended to the blob names for the segments published to Azure deep storage| |""| -|`druid.azure.protocol`|the protocol to use|http or https|https| -|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3| -|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024| -|`druid.azure.storageAccountEndpointSuffix`| The endpoint suffix to use. Use this config instead of `druid.azure.endpointSuffix`. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api). This config supports storage accounts enabled for [AzureDNSZone](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal). Note: do not include the storage account name prefix in this config value. | Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`| `blob.core.windows.net`| -See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information. +::: + +### Ingest data from Azure + +Ingest data using either [MSQ](../../multi-stage-query/index.md) or a native batch [parallel task](../../ingestion/native-batch.md) with an [Azure input source](../../ingestion/input-sources.md#azure-input-source) (`azureStorage`) to read objects directly from Azure Blob Storage. + +### Store segments in Azure + +:::info + +To enable Azure for deep storage, explicitly enable it by setting `druid.storage.type=azure`. + +::: + +#### Configuration - Location + +Use the following configuration to setup the where to store segments: + +| Property | Description | Default | +|---|---|---| +| `druid.azure.account` | The Azure Storage account name. | Must be set. | +| `druid.azure.container` | The Azure Storage container name. | Must be set. | +| `druid.azure.prefix` | A prefix string that will be prepended to the blob names for the segments published. | "" | +| `druid.azure.maxTries` | Number of tries before canceling an Azure operation. | 3 | +| `druid.azure.protocol` | The protocol to use to connect to the Azure Storage account. Either `http` or `https`. | `https` | +| `druid.azure.storageAccountEndpointSuffix` | The Storage account endpoint to use. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api) or storage accounts with [Azure DNS zone endpoints](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview#azure-dns-zone-endpoints-preview).<br/><br/>Do _not_ include the storage account name prefix in this config value.<br/><br/>Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`. | `blob.core.windows.net` | + +#### Configuration - Authentication Review Comment: ```suggestion #### Configure authentication ``` ########## docs/development/extensions-core/azure.md: ########## @@ -22,25 +22,75 @@ title: "Microsoft Azure" ~ under the License. --> +## Azure extension + +This extension allows you to do three things: + +* [Ingest data](#ingest-data-from-azure) from objects stored in Azure Blob Storage. +* [Write segments](#store-segments-in-azure) to Azure Blob Storage for deep storage. +* [Persist task logs](#persist-task-logs-in-azure) to Azure blob storage for long-term storage. + +:::info To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-azure-extensions` in the extensions load list. -## Deep Storage - -[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration. - -|Property|Description|Possible Values|Default| -|--------|---------------|-----------|-------| -|`druid.storage.type`|azure||Must be set.| -|`druid.azure.account`||Azure Storage account name.|Must be set.| -|`druid.azure.key`||Azure Storage account key.|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.| -|`druid.azure.sharedAccessStorageToken`||Azure Shared Storage access token|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain..| -|`druid.azure.useAzureCredentialsChain`|Use [DefaultAzureCredential](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable) for authentication|Optional. Set one of key, sharedAccessStorageToken or useAzureCredentialsChain.|False| -|`druid.azure.managedIdentityClientId`|If you want to use managed identity authentication in the `DefaultAzureCredential`, `useAzureCredentialsChain` must be true.||Optional.| -|`druid.azure.container`||Azure Storage container name.|Must be set.| -|`druid.azure.prefix`|A prefix string that will be prepended to the blob names for the segments published to Azure deep storage| |""| -|`druid.azure.protocol`|the protocol to use|http or https|https| -|`druid.azure.maxTries`|Number of tries before canceling an Azure operation.| |3| -|`druid.azure.maxListingLength`|maximum number of input files matching a given prefix to retrieve at a time| |1024| -|`druid.azure.storageAccountEndpointSuffix`| The endpoint suffix to use. Use this config instead of `druid.azure.endpointSuffix`. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api). This config supports storage accounts enabled for [AzureDNSZone](https://learn.microsoft.com/en-us/azure/dns/dns-getstarted-portal). Note: do not include the storage account name prefix in this config value. | Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`| `blob.core.windows.net`| -See [Azure Services](http://azure.microsoft.com/en-us/pricing/free-trial/) for more information. +::: + +### Ingest data from Azure + +Ingest data using either [MSQ](../../multi-stage-query/index.md) or a native batch [parallel task](../../ingestion/native-batch.md) with an [Azure input source](../../ingestion/input-sources.md#azure-input-source) (`azureStorage`) to read objects directly from Azure Blob Storage. + +### Store segments in Azure + +:::info + +To enable Azure for deep storage, explicitly enable it by setting `druid.storage.type=azure`. + +::: + +#### Configuration - Location + +Use the following configuration to setup the where to store segments: + +| Property | Description | Default | +|---|---|---| +| `druid.azure.account` | The Azure Storage account name. | Must be set. | +| `druid.azure.container` | The Azure Storage container name. | Must be set. | +| `druid.azure.prefix` | A prefix string that will be prepended to the blob names for the segments published. | "" | +| `druid.azure.maxTries` | Number of tries before canceling an Azure operation. | 3 | +| `druid.azure.protocol` | The protocol to use to connect to the Azure Storage account. Either `http` or `https`. | `https` | +| `druid.azure.storageAccountEndpointSuffix` | The Storage account endpoint to use. Override the default value to connect to [Azure Government](https://learn.microsoft.com/en-us/azure/azure-government/documentation-government-get-started-connect-to-storage#getting-started-with-storage-api) or storage accounts with [Azure DNS zone endpoints](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview#azure-dns-zone-endpoints-preview).<br/><br/>Do _not_ include the storage account name prefix in this config value.<br/><br/>Examples: `ABCD1234.blob.storage.azure.net`, `blob.core.usgovcloudapi.net`. | `blob.core.windows.net` | + +#### Configuration - Authentication + +The Azure extension currently supports authenticating with either an [SAS Token](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview), a [Shared Key](https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key), or by using the default Azure credentials chain ([`DefaultAzureCredential`](https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme#defaultazurecredential)). Use the following configuration to use either of these options: + +:::info + +One authentication needs to be provided. Set one of `sharedAccessStorageToken`, `key` or `useAzureCredentialsChain`. + +::: + +| Property | Description | Default | +|---|---|---| +| `druid.azure.sharedAccessStorageToken` | The SAS (Shared Storage Access) token. | | +| `druid.azure.key` | The Shared Key. | | +| `druid.azure.useAzureCredentialsChain` | If `true`, use `DefaultAzureCredential` for authentication. | `false` | +| `druid.azure.managedIdentityClientId` | To use managed identity authentication in the `DefaultAzureCredential`, set `useAzureCredentialsChain` to `true` and provide the client ID here. | | + +### Persist task logs in Azure + +:::info + +To enable Azure for persisting task logs, explicitly enable it by setting `druid.indexer.logs.type=azure`. Review Comment: ```suggestion To persist task logs in Azure Blob Storage, set `druid.indexer.logs.type=azure`. ``` ########## docs/ingestion/input-sources.md: ########## @@ -813,16 +818,16 @@ rolled-up datasource `wikipedia_rollup` by grouping on hour, "countryName", and ``` :::info - Note: Older versions (0.19 and earlier) did not respect the timestampSpec when using the Druid input source. If you - have ingestion specs that rely on this and cannot rewrite them, set - [`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`](../configuration/index.md#indexer-general-configuration) - to `true` to enable a compatibility mode where the timestampSpec is ignored. + +Note: Older versions (0.19 and earlier) did not respect the timestampSpec when using the Druid input source. If you have ingestion specs that rely on this and cannot rewrite them, set [`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`](../configuration/index.md#indexer-general-configuration) to `true` to enable a compatibility mode where the timestampSpec is ignored. Review Comment: ```suggestion Older versions (0.19 and earlier) did not respect the timestampSpec when using the Druid input source. If you have ingestion specs that rely on this and cannot rewrite them, set [`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`](../configuration/index.md#indexer-general-configuration) to `true` to enable a compatibility mode where the timestampSpec is ignored. ``` ########## docs/ingestion/input-sources.md: ########## @@ -210,13 +212,17 @@ Properties Object: |assumeRoleExternalId|A unique identifier that might be required when you assume a role in another account [see](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html)|None|no| :::info - **Note:** If `accessKeyId` and `secretAccessKey` are not given, the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used. + +**Note:** If `accessKeyId` and `secretAccessKey` are not given, the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used. Review Comment: ```suggestion If `accessKeyId` and `secretAccessKey` are not given, the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
