petermarshallio commented on a change in pull request #11490:
URL: https://github.com/apache/druid/pull/11490#discussion_r738338327



##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|

Review comment:
       (See line 864)

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|

Review comment:
       Revisiting the wording on this @techdocsmith – I'm not sure what "Empty 
objects starting with one of the given prefixes will be skipped." means here.  
Maybe we revert this bit?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to