This is an automated email from the ASF dual-hosted git repository.

techdocsmith pushed a commit to branch 28.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/28.0.0 by this push:
     new f8ecf9380f7 [backport]docs: durable storage azure cleanup (#15120) 
(#15296)
f8ecf9380f7 is described below

commit f8ecf9380f7b1952b48221589e859f8419adffd2
Author: 317brian <[email protected]>
AuthorDate: Wed Nov 1 13:14:17 2023 -0700

    [backport]docs: durable storage azure cleanup (#15120) (#15296)
    
    Co-authored-by: Laksh Singla <[email protected]>
---
 docs/multi-stage-query/reference.md | 54 ++++++++++++++++++++-----------------
 docs/multi-stage-query/security.md  | 25 ++++++++++-------
 docs/operations/durable-storage.md  | 17 ++++++++----
 3 files changed, 57 insertions(+), 39 deletions(-)

diff --git a/docs/multi-stage-query/reference.md 
b/docs/multi-stage-query/reference.md
index 3fd2335d052..a497afa3a71 100644
--- a/docs/multi-stage-query/reference.md
+++ b/docs/multi-stage-query/reference.md
@@ -354,40 +354,44 @@ SQL-based ingestion supports using durable storage to 
store intermediate files t
 
 ### Durable storage configurations
 
-Durable storage is supported on Amazon S3 storage and Microsoft's Azure 
storage. There are a few common configurations that controls the behavior for 
both the services as documented below. Apart from the common configurations,
-there are a few properties specific to each storage that must be set.
+Durable storage is supported on Amazon S3 storage and Microsoft's Azure Blob 
Storage. 
+There are common configurations that control the behavior regardless of which 
storage service you use. Apart from these common configurations, there are a 
few properties specific to S3 and to Azure.
 
 Common properties to configure the behavior of durable storage
 
-|Parameter          |Default                                 | Description     
     |
-|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.enable` | false |  Whether to enable durable 
storage for the cluster. Set it to true to enable durable storage. For more 
information about enabling durable storage, see [Durable 
storage](../operations/durable-storage.md).|
-|`druid.msq.intermediate.storage.type` | n/a | Required. The type of storage 
to use. Set it to `s3` for S3 and `azure` for Azure |
-|`druid.msq.intermediate.storage.tempDir`| n/a | Required. Directory path on 
the local disk to store temporary files required while uploading and 
downloading the data  |
-|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max 
number times to attempt S3 API calls to avoid failures due to transient errors. 
| 
-|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the 
size of each chunk to temporarily store in 
`druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB 
and 5 GiB. A large chunk size reduces the API calls made to the durable 
storage, however it requires more disk space to store the temporary chunks. 
Druid uses a default of 100MiB if the value is not provided.| 
+|Parameter          | Required | Description          | Default | 
+|--|--|--|
+|`druid.msq.intermediate.storage.enable`  | Yes |  Whether to enable durable 
storage for the cluster. Set it to true to enable durable storage. For more 
information about enabling durable storage, see [Durable 
storage](../operations/durable-storage.md). | false | 
+|`druid.msq.intermediate.storage.type` |  Yes | The type of storage to use. 
Set it to `s3` for S3 and `azure` for Azure | n/a |
+|`druid.msq.intermediate.storage.tempDir`| Yes |  Directory path on the local 
disk to store temporary files required while uploading and downloading the data 
 | n/a |
+|`druid.msq.intermediate.storage.maxRetry` |  No | Defines the max number 
times to attempt S3 API calls to avoid failures due to transient errors. | 10 |
+|`druid.msq.intermediate.storage.chunkSize` | No | Defines the size of each 
chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The 
chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API 
calls made to the durable storage, however it requires more disk space to store 
the temporary chunks. Druid uses a default of 100MiB if the value is not 
provided.| 100MiB | 
 
-Following properties need to be set in addition to the common properties to 
enable durable storage on S3
+To use S3 for durable storage, you also need to configure the following 
properties:
 
-|Parameter          |Default                                 | Description     
     |
-|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.bucket` | n/a | Required. The S3 bucket where 
the files are uploaded to and download from |
-|`druid.msq.intermediate.storage.prefix` | n/a | Required. Path prepended to 
all the paths uploaded to the bucket to namespace the connector's files. 
Provide a unique value for the prefix and do not share the same prefix between 
different clusters. If the location includes other files or directories, then 
they might get cleaned up as well.  |
+|Parameter          | Required | Description  | Default |
+|-------------------|----------------------------------------|----------------------|
 --|
+|`druid.msq.intermediate.storage.bucket` | Yes | The S3 bucket where the files 
are uploaded to and download from | n/a |
+|`druid.msq.intermediate.storage.prefix` | Yes | Path prepended to all the 
paths uploaded to the bucket to namespace the connector's files. Provide a 
unique value for the prefix and do not share the same prefix between different 
clusters. If the location includes other files or directories, then they might 
get cleaned up as well.  | n/a | 
 
-Following properties must be set in addition to the common properties to 
enable durable storage on Azure.  
+To use Azure for durable storage, you also need to configure the following 
properties:
 
-|Parameter          |Default                                 | Description     
     |
-|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.container` | n/a | Required. The Azure 
container where the files are uploaded to and downloaded from.  |
-|`druid.msq.intermediate.storage.prefix` | n/a | Required. Path prepended to 
all the paths uploaded to the container to namespace the connector's files. 
Provide a unique value for the prefix and do not share the same prefix between 
different clusters. If the location includes other files or directories, then 
they might get cleaned up as well. |
+|Parameter          | Required  | Description          | Default |
+|-------------------|----------------------------------------|----------------------|
 - |
+|`druid.msq.intermediate.storage.container` | Yes | The Azure container where 
the files are uploaded to and downloaded from.  | n/a |
+|`druid.msq.intermediate.storage.prefix` | Yes | Path prepended to all the 
paths uploaded to the container to namespace the connector's files. Provide a 
unique value for the prefix and do not share the same prefix between different 
clusters. If the location includes other files or directories, then they might 
get cleaned up as well. | n/a |
 
-Durable storage creates files on the remote storage and is cleaned up once the 
job no longer requires those files. However, due to failures causing abrupt 
exit of the tasks, these files might not get cleaned up.
-Therefore, there are certain properties that you configure on the Overlord 
specifically to clean up intermediate files for the tasks that have completed 
and would no longer require these files:
+### Durable storage cleaner configurations
 
-|Parameter          |Default                                 | Description     
     |
-|-------------------|----------------------------------------|----------------------|
-|`druid.msq.intermediate.storage.cleaner.enabled`| false | Optional. Whether 
durable storage cleaner should be enabled for the cluster.  |
-|`druid.msq.intermediate.storage.cleaner.delaySeconds`| 86400 | Optional. The 
delay (in seconds) after the last run post which the durable storage cleaner 
would clean the outputs.  |
+Durable storage creates files on the remote storage, and these files get 
cleaned up once a job no longer requires those files. However, due to failures 
causing abrupt exits of tasks, these files might not get cleaned up.
+You can configure the Overlord to periodically clean up these intermediate 
files after a task completes and the files are no longer need. The files that 
get cleaned up are determined by the storage prefix you configure. Any files 
that match the path for the storage prefix may get cleaned up, not just 
intermediate files that are no longer needed.
+
+Use the following configurations to control the cleaner:
+
+|Parameter  | Required  | Description | Default | 
+|--|--|--|--|
+|`druid.msq.intermediate.storage.cleaner.enabled`|  No | Whether durable 
storage cleaner should be enabled for the cluster.  | false |
+|`druid.msq.intermediate.storage.cleaner.delaySeconds`| No | The delay (in 
seconds) after the latest run post which the durable storage cleaner cleans the 
up files.  | 86400 | 
 
 
 ## Limits
diff --git a/docs/multi-stage-query/security.md 
b/docs/multi-stage-query/security.md
index 2d412f40654..3c395e40c57 100644
--- a/docs/multi-stage-query/security.md
+++ b/docs/multi-stage-query/security.md
@@ -60,17 +60,24 @@ Depending on what a user is trying to do, they might also 
need the following per
 
 
 
-## S3
+## Permissions for durable storage
 
-The MSQ task engine can use S3 to store intermediate files when running 
queries. This can increase its reliability but requires certain permissions in 
S3.
-These permissions are required if you configure durable storage. 
+The MSQ task engine can use Amazon S3 or Azure Blog Storage to store 
intermediate files when running queries. To upload, read, move and delete these 
intermediate files, the MSQ task engine requires certain permissions specific 
to the storage provider. 
 
-Permissions for pushing and fetching intermediate stage results to and from S3:
+### S3
 
-- `s3:GetObject`
-- `s3:PutObject`
-- `s3:AbortMultipartUpload`
+The MSQ task engine needs the following permissions for pushing,  fetching, 
and removing intermediate stage results to and from S3:
 
-Permissions for removing intermediate stage results:
+- `s3:GetObject` to retrieve files. Note that `GetObject` also requires read 
permission on the object that gets retrieved. 
+- `s3:PutObject` to upload files.
+- `s3:AbortMultipartUpload` to cancel the upload of files
+- `s3:DeleteObject` to delete files when they're no longer needed. 
 
-- `s3:DeleteObject`
\ No newline at end of file
+### Azure
+
+The MSQ task engine needs the following permissions for pushing, fetching, and 
removing intermediate stage results to and from Azure:
+
+- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` to 
read and list files in durable storage.
+- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write` to 
write files in durable storage.
+- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action` 
to create files in durable storage.
+- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete` to 
delete files when they're no longer needed.
\ No newline at end of file
diff --git a/docs/operations/durable-storage.md 
b/docs/operations/durable-storage.md
index 80545f9a9b2..b7a8ad1ef90 100644
--- a/docs/operations/durable-storage.md
+++ b/docs/operations/durable-storage.md
@@ -39,13 +39,20 @@ To enable durable storage, you need to set the following 
common service properti
 
 ```
 druid.msq.intermediate.storage.enable=true
-druid.msq.intermediate.storage.type=s3
-druid.msq.intermediate.storage.bucket=YOUR_BUCKET
-druid.msq.intermediate.storage.prefix=YOUR_PREFIX
 druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+
+# Include these configs if you're using S3
+# druid.msq.intermediate.storage.type=s3
+# druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+
+# Include these configs if you're using Azure Blob Storage
+# druid.msq.intermediate.storage.type=azure
+# druid.sq.intermediate.storage.container=YOUR_CONTAINER
+
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
 ```
 
-For detailed information about the settings related to durable storage, see 
[Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+For detailed information about these and additional settings related to 
durable storage, see [Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
 
 
 ## Use durable storage for SQL-based ingestion queries
@@ -80,7 +87,7 @@ cleaner can be scheduled to clean the directories 
corresponding to which there i
 the storage connector to work upon the durable storage. The durable storage 
location should only be utilized to store the output
 for the cluster's MSQ tasks. If the location contains other files or 
directories, then they will get cleaned up as well.
 
-Use `druid.msq.intermediate.storage.cleaner.enabled` and 
`druid.msq.intermediate.storage.cleaner.delaySEconds` to configure the cleaner. 
For more information, see [Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
+Use `druid.msq.intermediate.storage.cleaner.enabled` and 
`druid.msq.intermediate.storage.cleaner.delaySeconds` to configure the cleaner. 
For more information, see [Durable storage 
configurations](../multi-stage-query/reference.md#durable-storage-configurations).
 
 Note that if you choose to write query results to durable storage,the results 
are cleaned up when the task is removed from the metadata store.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to