petermarshallio commented on a change in pull request #11490:
URL: https://github.com/apache/druid/pull/11490#discussion_r738338327
##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the
input source.
### S3 Input Source
-> You need to include the
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension
to use the S3 input source.
+Use the *S3 input source* to read objects directly from S3-like storage.
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to
[load](../development/extensions.html#loading-extensions) the
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel
task](#parallel-task). Each `index_parallel` task will then read one or
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be
ingested. Empty objects starting with one of the given prefixes will be
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be
set|
Review comment:
(See line 864)
##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the
input source.
### S3 Input Source
-> You need to include the
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension
to use the S3 input source.
+Use the *S3 input source* to read objects directly from S3-like storage.
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to
[load](../development/extensions.html#loading-extensions) the
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel
task](#parallel-task). Each `index_parallel` task will then read one or
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be
ingested. Empty objects starting with one of the given prefixes will be
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be
set|
Review comment:
Revisiting the wording on this @techdocsmith – I'm not sure what "Empty
objects starting with one of the given prefixes will be skipped." means here.
Maybe we revert this bit?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]