This is an automated email from the ASF dual-hosted git repository.
techdocsmith pushed a commit to branch 28.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/28.0.0 by this push:
new f8ecf9380f7 [backport]docs: durable storage azure cleanup (#15120)
(#15296)
f8ecf9380f7 is described below
commit f8ecf9380f7b1952b48221589e859f8419adffd2
Author: 317brian <[email protected]>
AuthorDate: Wed Nov 1 13:14:17 2023 -0700
[backport]docs: durable storage azure cleanup (#15120) (#15296)
Co-authored-by: Laksh Singla <[email protected]>
---
docs/multi-stage-query/reference.md | 54 ++++++++++++++++++++-----------------
docs/multi-stage-query/security.md | 25 ++++++++++-------
docs/operations/durable-storage.md | 17 ++++++++----
3 files changed, 57 insertions(+), 39 deletions(-)
diff --git a/docs/multi-stage-query/reference.md
b/docs/multi-stage-query/reference.md
index 3fd2335d052..a497afa3a71 100644
--- a/docs/multi-stage-query/reference.md
+++ b/docs/multi-stage-query/reference.md
@@ -354,40 +354,44 @@ SQL-based ingestion supports using durable storage to
store intermediate files t
### Durable storage configurations
-Durable storage is supported on Amazon S3 storage and Microsoft's Azure
storage. There are a few common configurations that controls the behavior for
both the services as documented below. Apart from the common configurations,
-there are a few properties specific to each storage that must be set.
+Durable storage is supported on Amazon S3 storage and Microsoft's Azure Blob
Storage.
+There are common configurations that control the behavior regardless of which
storage service you use. Apart from these common configurations, there are a
few properties specific to S3 and to Azure.
Common properties to configure the behavior of durable storage
-|Parameter |Default | Description
|
-|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.enable` | false | Whether to enable durable
storage for the cluster. Set it to true to enable durable storage. For more
information about enabling durable storage, see [Durable
storage](../operations/durable-storage.md).|
-|`druid.msq.intermediate.storage.type` | n/a | Required. The type of storage
to use. Set it to `s3` for S3 and `azure` for Azure |
-|`druid.msq.intermediate.storage.tempDir`| n/a | Required. Directory path on
the local disk to store temporary files required while uploading and
downloading the data |
-|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max
number times to attempt S3 API calls to avoid failures due to transient errors.
|
-|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the
size of each chunk to temporarily store in
`druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB
and 5 GiB. A large chunk size reduces the API calls made to the durable
storage, however it requires more disk space to store the temporary chunks.
Druid uses a default of 100MiB if the value is not provided.|
+|Parameter | Required | Description | Default |
+|--|--|--|
+|`druid.msq.intermediate.storage.enable` | Yes | Whether to enable durable
storage for the cluster. Set it to true to enable durable storage. For more
information about enabling durable storage, see [Durable
storage](../operations/durable-storage.md). | false |
+|`druid.msq.intermediate.storage.type` | Yes | The type of storage to use.
Set it to `s3` for S3 and `azure` for Azure | n/a |
+|`druid.msq.intermediate.storage.tempDir`| Yes | Directory path on the local
disk to store temporary files required while uploading and downloading the data
| n/a |
+|`druid.msq.intermediate.storage.maxRetry` | No | Defines the max number
times to attempt S3 API calls to avoid failures due to transient errors. | 10 |
+|`druid.msq.intermediate.storage.chunkSize` | No | Defines the size of each
chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The
chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API
calls made to the durable storage, however it requires more disk space to store
the temporary chunks. Druid uses a default of 100MiB if the value is not
provided.| 100MiB |
-Following properties need to be set in addition to the common properties to
enable durable storage on S3
+To use S3 for durable storage, you also need to configure the following
properties:
-|Parameter |Default | Description
|
-|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.bucket` | n/a | Required. The S3 bucket where
the files are uploaded to and download from |
-|`druid.msq.intermediate.storage.prefix` | n/a | Required. Path prepended to
all the paths uploaded to the bucket to namespace the connector's files.
Provide a unique value for the prefix and do not share the same prefix between
different clusters. If the location includes other files or directories, then
they might get cleaned up as well. |
+|Parameter | Required | Description | Default |
+|-------------------|----------------------------------------|----------------------|
--|
+|`druid.msq.intermediate.storage.bucket` | Yes | The S3 bucket where the files
are uploaded to and download from | n/a |
+|`druid.msq.intermediate.storage.prefix` | Yes | Path prepended to all the
paths uploaded to the bucket to namespace the connector's files. Provide a
unique value for the prefix and do not share the same prefix between different
clusters. If the location includes other files or directories, then they might
get cleaned up as well. | n/a |
-Following properties must be set in addition to the common properties to
enable durable storage on Azure.
+To use Azure for durable storage, you also need to configure the following
properties:
-|Parameter |Default | Description
|
-|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.container` | n/a | Required. The Azure
container where the files are uploaded to and downloaded from. |
-|`druid.msq.intermediate.storage.prefix` | n/a | Required. Path prepended to
all the paths uploaded to the container to namespace the connector's files.
Provide a unique value for the prefix and do not share the same prefix between
different clusters. If the location includes other files or directories, then
they might get cleaned up as well. |
+|Parameter | Required | Description | Default |
+|-------------------|----------------------------------------|----------------------|
- |
+|`druid.msq.intermediate.storage.container` | Yes | The Azure container where
the files are uploaded to and downloaded from. | n/a |
+|`druid.msq.intermediate.storage.prefix` | Yes | Path prepended to all the
paths uploaded to the container to namespace the connector's files. Provide a
unique value for the prefix and do not share the same prefix between different
clusters. If the location includes other files or directories, then they might
get cleaned up as well. | n/a |
-Durable storage creates files on the remote storage and is cleaned up once the
job no longer requires those files. However, due to failures causing abrupt
exit of the tasks, these files might not get cleaned up.
-Therefore, there are certain properties that you configure on the Overlord
specifically to clean up intermediate files for the tasks that have completed
and would no longer require these files:
+### Durable storage cleaner configurations
-|Parameter |Default | Description
|
-|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.cleaner.enabled`| false | Optional. Whether
durable storage cleaner should be enabled for the cluster. |
-|`druid.msq.intermediate.storage.cleaner.delaySeconds`| 86400 | Optional. The
delay (in seconds) after the last run post which the durable storage cleaner
would clean the outputs. |
+Durable storage creates files on the remote storage, and these files get
cleaned up once a job no longer requires those files. However, due to failures
causing abrupt exits of tasks, these files might not get cleaned up.
+You can configure the Overlord to periodically clean up these intermediate
files after a task completes and the files are no longer need. The files that
get cleaned up are determined by the storage prefix you configure. Any files
that match the path for the storage prefix may get cleaned up, not just
intermediate files that are no longer needed.
+
+Use the following configurations to control the cleaner:
+
+|Parameter | Required | Description | Default |
+|--|--|--|--|
+|`druid.msq.intermediate.storage.cleaner.enabled`| No | Whether durable
storage cleaner should be enabled for the cluster. | false |
+|`druid.msq.intermediate.storage.cleaner.delaySeconds`| No | The delay (in
seconds) after the latest run post which the durable storage cleaner cleans the
up files. | 86400 |
## Limits
diff --git a/docs/multi-stage-query/security.md
b/docs/multi-stage-query/security.md
index 2d412f40654..3c395e40c57 100644
--- a/docs/multi-stage-query/security.md
+++ b/docs/multi-stage-query/security.md
@@ -60,17 +60,24 @@ Depending on what a user is trying to do, they might also
need the following per
-## S3
+## Permissions for durable storage
-The MSQ task engine can use S3 to store intermediate files when running
queries. This can increase its reliability but requires certain permissions in
S3.
-These permissions are required if you configure durable storage.
+The MSQ task engine can use Amazon S3 or Azure Blog Storage to store
intermediate files when running queries. To upload, read, move and delete these
intermediate files, the MSQ task engine requires certain permissions specific
to the storage provider.
-Permissions for pushing and fetching intermediate stage results to and from S3:
+### S3
-- `s3:GetObject`
-- `s3:PutObject`
-- `s3:AbortMultipartUpload`
+The MSQ task engine needs the following permissions for pushing, fetching,
and removing intermediate stage results to and from S3:
-Permissions for removing intermediate stage results:
+- `s3:GetObject` to retrieve files. Note that `GetObject` also requires read
permission on the object that gets retrieved.
+- `s3:PutObject` to upload files.
+- `s3:AbortMultipartUpload` to cancel the upload of files
+- `s3:DeleteObject` to delete files when they're no longer needed.
-- `s3:DeleteObject`
\ No newline at end of file
+### Azure
+
+The MSQ task engine needs the following permissions for pushing, fetching, and
removing intermediate stage results to and from Azure:
+
+- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` to
read and list files in durable storage.
+- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write` to
write files in durable storage.
+- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action`
to create files in durable storage.
+- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete` to
delete files when they're no longer needed.
\ No newline at end of file
diff --git a/docs/operations/durable-storage.md
b/docs/operations/durable-storage.md
index 80545f9a9b2..b7a8ad1ef90 100644
--- a/docs/operations/durable-storage.md
+++ b/docs/operations/durable-storage.md
@@ -39,13 +39,20 @@ To enable durable storage, you need to set the following
common service properti
```
druid.msq.intermediate.storage.enable=true
-druid.msq.intermediate.storage.type=s3
-druid.msq.intermediate.storage.bucket=YOUR_BUCKET
-druid.msq.intermediate.storage.prefix=YOUR_PREFIX
druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+
+# Include these configs if you're using S3
+# druid.msq.intermediate.storage.type=s3
+# druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+
+# Include these configs if you're using Azure Blob Storage
+# druid.msq.intermediate.storage.type=azure
+# druid.sq.intermediate.storage.container=YOUR_CONTAINER
+
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
```
-For detailed information about the settings related to durable storage, see
[Durable storage
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+For detailed information about these and additional settings related to
durable storage, see [Durable storage
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
## Use durable storage for SQL-based ingestion queries
@@ -80,7 +87,7 @@ cleaner can be scheduled to clean the directories
corresponding to which there i
the storage connector to work upon the durable storage. The durable storage
location should only be utilized to store the output
for the cluster's MSQ tasks. If the location contains other files or
directories, then they will get cleaned up as well.
-Use `druid.msq.intermediate.storage.cleaner.enabled` and
`druid.msq.intermediate.storage.cleaner.delaySEconds` to configure the cleaner.
For more information, see [Durable storage
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+Use `druid.msq.intermediate.storage.cleaner.enabled` and
`druid.msq.intermediate.storage.cleaner.delaySeconds` to configure the cleaner.
For more information, see [Durable storage
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
Note that if you choose to write query results to durable storage,the results
are cleaned up when the task is removed from the metadata store.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]