cryptoe commented on code in PR #14035:
URL: https://github.com/apache/druid/pull/14035#discussion_r1160824147
##########
docs/multi-stage-query/reference.md:
##########
@@ -696,19 +696,55 @@ CLUSTERED BY user
## Durable Storage
-This section enumerates the advantages and performance implications of
enabling durable storage while executing MSQ tasks.
+Using durable storage with your SQL-based ingestions can improve their
reliability by writing intermediate files to a storage location temporarily.
To prevent durable storage from getting filled up with temporary files in case
the tasks fail to clean them up, a periodic
cleaner can be scheduled to clean the directories corresponding to which there
isn't a controller task running. It utilizes
the storage connector to work upon the durable storage. The durable storage
location should only be utilized to store the output
for cluster's MSQ tasks. If the location contains other files or directories,
then they will get cleaned up as well.
-Following table lists the properties that can be set to control the behavior
of the durable storage of the cluster.
+
+### Enable durable storage
+
+To enable durable storage, you need to set the following common service
properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For information about these settings and others related to durable storage,
see [Durable storage configurations](#durable-storage-configurations).
Review Comment:
```suggestion
For detailed information about the settings related to durable storage, see
[Durable storage configurations](#durable-storage-configurations).
```
##########
docs/multi-stage-query/reference.md:
##########
@@ -696,19 +696,55 @@ CLUSTERED BY user
## Durable Storage
-This section enumerates the advantages and performance implications of
enabling durable storage while executing MSQ tasks.
+Using durable storage with your SQL-based ingestions can improve their
reliability by writing intermediate files to a storage location temporarily.
To prevent durable storage from getting filled up with temporary files in case
the tasks fail to clean them up, a periodic
cleaner can be scheduled to clean the directories corresponding to which there
isn't a controller task running. It utilizes
the storage connector to work upon the durable storage. The durable storage
location should only be utilized to store the output
for cluster's MSQ tasks. If the location contains other files or directories,
then they will get cleaned up as well.
-Following table lists the properties that can be set to control the behavior
of the durable storage of the cluster.
+
+### Enable durable storage
+
+To enable durable storage, you need to set the following common service
properties:
+
+```
+druid.msq.intermediate.storage.enable=true
+druid.msq.intermediate.storage.type=s3
+druid.msq.intermediate.storage.bucket=YOUR_BUCKET
+druid.msq.intermediate.storage.prefix=YOUR_PREFIX
+druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir
+```
+
+For information about these settings and others related to durable storage,
see [Durable storage configurations](#durable-storage-configurations).
+
+
+### Use durable storage for queries
+
+When you run a query, include the context parameter `durableShuffleStorage`
and set it to `true`.
+
+For queries where you want to use fault tolerance for workers, set
`faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to
`true`.
+
+## Durable storage configurations
+
+The following common service properties control how durable storage behaves:
+
+|Parameter |Default | Description
|
+|-------------------|----------------------------------------|----------------------|
+|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you
want to store intermediate files. |
+| `druid.msq.intermediate.storage.chunkSize` | n/a | Optional. Defines the
size of each chunk to temporarily store in
`druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB
and 5 GiB. Druid computes the chunk size automatically if no value is
provided.|
+|`druid.msq.intermediate.storage.enable` | true | Required. Whether to enable
durable storage for the cluster.|
+| `druid.msq.intermediate.storage.maxTriesOnTransientErrors` | 10 | Optional.
Defines the max number times to attempt S3 API calls to avoid failures due to
transient errors. |
+|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 |
Required. The type of storage to use. You can either set this to `local` or
`s3`. |
+|`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store
intermediate stage results. Provide a unique value for the prefix. Don't share
the same prefix between clusters. If the location includes other files or
directories, then they will get cleaned up as well. |
+| `druid.msq.intermediate.storage.tempDir`| | Required. Directory path on the
local disk to temporarily store intermediate stage results. |
+
+In addition to the common service properties, there are certain properties
that you configure on the Overlord specifically:
Review Comment:
```suggestion
In addition to the common service properties, there are certain properties
that you configure on the Overlord specifically to clean up intermediate files:
```
##########
docs/multi-stage-query/security.md:
##########
@@ -50,3 +50,17 @@ To interact with a query through the Overlord API, users
need the following perm
- `INSERT` or `REPLACE` queries: Users must have READ DATASOURCE permission on
the output datasource.
- `SELECT` queries: Users must have read permissions on the `__query_select`
datasource, which is a stub datasource that gets created.
+
+## S3
+
+The MSQ task engine can use S3 to store intermediate files when running
queries. This can increase its reliability but requires certain permissions in
S3.
Review Comment:
```suggestion
The MSQ task engine can use S3 to store intermediate files when running
queries. This can increase its reliability but requires certain permissions in
S3.
These permissions are required if you configure durable storage.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]