[GitHub] [druid] techdocsmith commented on a change in pull request #11490: Docs - S3 masking and nav update to S3 page

GitBox Fri, 22 Oct 2021 17:02:08 -0700


techdocsmith commented on a change in pull request #11490:
URL: https://github.com/apache/druid/pull/11490#discussion_r734892456




##########
File path: docs/development/extensions-core/s3.md
##########
@@ -36,7 +36,7 @@ The [S3 input 
source](../../ingestion/native-batch.md#s3-input-source) is suppor
 to read objects directly from S3. If you use the [Hadoop 
task](../../ingestion/hadoop.md),
 you can read data from S3 by specifying the S3 paths in your 
[`inputSpec`](../../ingestion/hadoop.md#inputspec).
 
-To configure the extension to read objects from S3 you need to configure how 
to [connect to S3](#configuration).
+To configure the extension to read objects from S3 you need to configure Druid 
to [connect to S3](#configuration).

Review comment:
       ```suggestion
   To configure the extension to read objects from S3, supply the S3 
[connection information](#configuration).
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -76,14 +77,15 @@ Druid uses the following credentials provider chain to 
connect to your S3 bucket
 |6|ECS container credentials|Based on environment variables available on AWS 
ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or 
AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the 
[EC2ContainerCredentialsProviderWrapper 
documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
 |7|Instance profile information|Based on the instance profile you may have 
attached to your druid instance|
 
-You can find more information about authentication method 
[here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
-**Note :** *Order is important here as it indicates the precedence of 
authentication methods.<br/>
-So if you are trying to use Instance profile information, you **must not** set 
`druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
+> You can find more information about authentication methods in the [Amazon 
Developer 
Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
+
+> Order is important here as it indicates the precedence of authentication 
methods. If you are trying to use Instance profile information, you **must 
not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid 
runtime.properties.
 
+> You can use the property 
[`druid.startup.logging.maskProperties`](../../configuration/index.html#startup-logging)
 to mask credentials information in Druid logs.  For example, `["password", 
"secretKey", "awsSecretAccessKey"]`.

Review comment:
       ```suggestion
   You can use the property 
[`druid.startup.logging.maskProperties`](../../configuration/index.html#startup-logging)
 to mask credentials information in Druid logs.  For example, `["password", 
"secretKey", "awsSecretAccessKey"]`.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|

Review comment:
       ```suggestion
   |`type`|Set value to `s3`.|None|yes|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:

Review comment:
       ```suggestion
   Specify objects to ingest as either:
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is 
specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or 
plain text string of this S3 InputSource's access key|None|yes if 
secretAccessKey is given|
+|`secretAccessKey`|The [Password Provider](../operations/password-provider.md) 
or plain text string of this S3 InputSource's secret key|None|yes if 
accessKeyId is given|
+|`assumeRoleArn`|AWS ARN of the role to assume.  See the [AWS User 
Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).
 `assumeRoleArn` can be used either with the ingestion spec AWS credentials or 
with the default S3 credentials|None|no|
+|`assumeRoleExternalId`|A unique identifier that might be required when you 
assume a role in another account.  See the [AWS User 
Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).|None|no|
+
+> If `accessKeyId` and `secretAccessKey` are not given, then the default [S3 
credentials provider 
chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.

Review comment:
       ```suggestion
   If you do not supply an `accessKeyId` and `secretAccessKey`, Druid uses the 
default [S3 credentials provider 
chain](../development/extensions-core/s3.md#s3-authentication-methods).
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -64,7 +64,8 @@ In addition to this you need to set additional configuration, 
specific for [deep
 ### S3 authentication methods
 
 Druid uses the following credentials provider chain to connect to your S3 
bucket (whether a deep storage bucket or source bucket).
-**Note :** *You can override the default credentials provider chain for 
connecting to source bucket by specifying an access key and secret key using 
[Properties Object](../../ingestion/native-batch.md#s3-input-source) parameters 
in the ingestionSpec.*
+
+> You can override the default credentials provider chain for connecting to 
the source bucket by specifying an access key and secret key using [Properties 
Object](../../ingestion/native-batch.md#s3-input-source) parameters in the 
ingestion specification.

Review comment:
       ```suggestion
   > To override the default credentials provider chain for connecting to the 
source bucket, specify an access key and secret key using [Properties 
Object](../../ingestion/native-batch.md#s3-input-source) parameters in the 
ingestion specification.
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -76,14 +77,15 @@ Druid uses the following credentials provider chain to 
connect to your S3 bucket
 |6|ECS container credentials|Based on environment variables available on AWS 
ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or 
AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the 
[EC2ContainerCredentialsProviderWrapper 
documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
 |7|Instance profile information|Based on the instance profile you may have 
attached to your druid instance|
 
-You can find more information about authentication method 
[here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
-**Note :** *Order is important here as it indicates the precedence of 
authentication methods.<br/>
-So if you are trying to use Instance profile information, you **must not** set 
`druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
+> You can find more information about authentication methods in the [Amazon 
Developer 
Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).

Review comment:
       ```suggestion
   Fore more information, refer to the [Amazon Developer 
Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -76,14 +77,15 @@ Druid uses the following credentials provider chain to 
connect to your S3 bucket
 |6|ECS container credentials|Based on environment variables available on AWS 
ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or 
AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the 
[EC2ContainerCredentialsProviderWrapper 
documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
 |7|Instance profile information|Based on the instance profile you may have 
attached to your druid instance|
 
-You can find more information about authentication method 
[here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
-**Note :** *Order is important here as it indicates the precedence of 
authentication methods.<br/>
-So if you are trying to use Instance profile information, you **must not** set 
`druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
+> You can find more information about authentication methods in the [Amazon 
Developer 
Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
+
+> Order is important here as it indicates the precedence of authentication 
methods. If you are trying to use Instance profile information, you **must 
not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid 
runtime.properties.

Review comment:
       ```suggestion
   The order of configuration parameters is important here because it indicates 
the precedence of authentication methods. If you are trying to use Instance 
profile information, do not set `druid.s3.accessKey` and `druid.s3.secretKey` 
in your Druid runtime.properties.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|

Review comment:
       ```suggestion
   |`uris`| JSON array of URIs defining the location of S3 objects to ingest 
|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.

Review comment:
       ```suggestion
   The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  In this case each `index_parallel` task reads one or 
more objects.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is 
specified.

Review comment:
       ```suggestion
   The S3 input source skips all empty objects only when `prefixes` is 
specified.
   ```

##########
File path: docs/development/extensions-core/s3.md
##########
@@ -76,14 +77,15 @@ Druid uses the following credentials provider chain to 
connect to your S3 bucket
 |6|ECS container credentials|Based on environment variables available on AWS 
ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or 
AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the 
[EC2ContainerCredentialsProviderWrapper 
documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
 |7|Instance profile information|Based on the instance profile you may have 
attached to your druid instance|
 
-You can find more information about authentication method 
[here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)<br/>
-**Note :** *Order is important here as it indicates the precedence of 
authentication methods.<br/>
-So if you are trying to use Instance profile information, you **must not** set 
`druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
+> You can find more information about authentication methods in the [Amazon 
Developer 
Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials).
+
+> Order is important here as it indicates the precedence of authentication 
methods. If you are trying to use Instance profile information, you **must 
not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid 
runtime.properties.
 
+> You can use the property 
[`druid.startup.logging.maskProperties`](../../configuration/index.html#startup-logging)
 to mask credentials information in Druid logs.  For example, `["password", 
"secretKey", "awsSecretAccessKey"]`.
 
 ### S3 permissions settings
 
-`s3:GetObject` and `s3:PutObject` are basically required for pushing/loading 
segments to/from S3.
+`s3:GetObject` and `s3:PutObject` are required for pushing / pulling segments 
to / from S3.

Review comment:
       ```suggestion
   `s3:GetObject` and `s3:PutObject` are required for pushing or pulling 
segments to or from S3.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)

Review comment:
       ```suggestion
   |[`properties`](#s3-input-properties-object)|Properties Object to override 
the default S3 configuration.|None|No (defaults will be used if not given)
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or

Review comment:
       ```suggestion
   - a list of S3 URI strings
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is 
specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or 
plain text string of this S3 InputSource's access key|None|yes if 
secretAccessKey is given|
+|`secretAccessKey`|The [Password Provider](../operations/password-provider.md) 
or plain text string of this S3 InputSource's secret key|None|yes if 
accessKeyId is given|
+|`assumeRoleArn`|AWS ARN of the role to assume.  See the [AWS User 
Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).
 `assumeRoleArn` can be used either with the ingestion spec AWS credentials or 
with the default S3 credentials|None|no|
+|`assumeRoleExternalId`|A unique identifier that might be required when you 
assume a role in another account.  See the [AWS User 
Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).|None|no|
+
+> If `accessKeyId` and `secretAccessKey` are not given, then the default [S3 
credentials provider 
chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.
+
+#### S3 Input Examples

Review comment:
       ```suggestion
   #### S3 input examples
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.

Review comment:
       ```suggestion
   all objects contained in the specified prefixes.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is 
specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|

Review comment:
       ```suggestion
   |`path`|The path to the data|None|yes|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is 
specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or 
plain text string of this S3 InputSource's access key|None|yes if 
secretAccessKey is given|
+|`secretAccessKey`|The [Password Provider](../operations/password-provider.md) 
or plain text string of this S3 InputSource's secret key|None|yes if 
accessKeyId is given|
+|`assumeRoleArn`|AWS ARN of the role to assume.  See the [AWS User 
Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).
 `assumeRoleArn` can be used either with the ingestion spec AWS credentials or 
with the default S3 credentials|None|no|
+|`assumeRoleExternalId`|A unique identifier that might be required when you 
assume a role in another account.  See the [AWS User 
Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).|None|no|
+
+> If `accessKeyId` and `secretAccessKey` are not given, then the default [S3 
credentials provider 
chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.
+
+#### S3 Input Examples
+
+Using URIs, this ingestion specification will ingest two specific objects:

Review comment:
       ```suggestion
   Using URIs, the following ingestion specification ingests two specific 
objects:
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|

Review comment:
       ```suggestion
   |`prefixes`| JSON array of URIs defining the URI prefixes for the locations 
of S3 objects to ingest. Druid skips empty objects starting with one of the 
given prefixes.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) 
must be set|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is 
specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object

Review comment:
       ```suggestion
   #### S3 input properties object
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -941,33 +985,7 @@ Sample specs:
 ...
 ```
 
-|property|description|default|required?|
-|--------|-----------|-------|---------|
-|type|This should be `s3`.|None|yes|
-|uris|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or `objects` must be set|
-|prefixes|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or `objects` must be set|
-|objects|JSON array of S3 Objects to be ingested.|None|`uris` or `prefixes` or 
`objects` must be set|
-|properties|Properties Object for overriding the default S3 configuration. See 
below for more information.|None|No (defaults will be used if not given)
-
-Note that the S3 input source will skip all empty objects only when `prefixes` 
is specified.
-
-S3 Object:
-
-|property|description|default|required?|
-|--------|-----------|-------|---------|
-|bucket|Name of the S3 bucket|None|yes|
-|path|The path where data is located.|None|yes|
-
-Properties Object:
-
-|property|description|default|required?|
-|--------|-----------|-------|---------|
-|accessKeyId|The [Password Provider](../operations/password-provider.md) or 
plain text string of this S3 InputSource's access key|None|yes if 
secretAccessKey is given|
-|secretAccessKey|The [Password Provider](../operations/password-provider.md) 
or plain text string of this S3 InputSource's secret key|None|yes if 
accessKeyId is given|
-|assumeRoleArn|AWS ARN of the role to assume 
[see](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html).
 **assumeRoleArn** can be used either with the ingestion spec AWS credentials 
or with the default S3 credentials|None|no|
-|assumeRoleExternalId|A unique identifier that might be required when you 
assume a role in another account 
[see](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html)|None|no|
-
-**Note :** *If accessKeyId and secretAccessKey are not given, the default [S3 
credentials provider 
chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.*
+> Read more about S3 and Druid on the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension page, 
including using S3-like for [Deep Storage](../dependencies/deep-storage.html), 
more about authentication, and additional configuration options.

Review comment:
       ```suggestion
   Learn more about S3 and Druid on the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension page, 
including using S3-like for [Deep Storage](../dependencies/deep-storage.html), 
more about authentication, and additional configuration options.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -900,6 +940,8 @@ Sample specs:
 ...
 ```
 
+This ingestion specification provides task-specific credentials to ingest two 
specific objects:

Review comment:
       ```suggestion
   The following ingestion specification provides task-specific credentials to 
ingest two specific objects:
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|

Review comment:
       ```suggestion
   |[`objects`](#s3-input-objects)|JSON array of S3 Objects to 
ingest.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be set|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -880,6 +919,7 @@ Sample specs:
 ...
 ```
 
+This time using `objects`, this specification will ingest two specific 
objects, one from the `foo` bucket, one from the `bar` bucket:

Review comment:
       ```suggestion
   The following example uses `objects` to ingest two specific objects, one 
from the `foo` bucket, one from the `bar` bucket:
   ```
   when possible opt for "real world" examples over "foo" & "bar"

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is 
specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or 
plain text string of this S3 InputSource's access key|None|yes if 
secretAccessKey is given|
+|`secretAccessKey`|The [Password Provider](../operations/password-provider.md) 
or plain text string of this S3 InputSource's secret key|None|yes if 
accessKeyId is given|

Review comment:
       ```suggestion
   |`secretAccessKey`|The [Password 
Provider](../operations/password-provider.md) or plain text string of the S3 
InputSource's secret key|None|yes if accessKeyId is given|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.

Review comment:
       ```suggestion
   You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest
+*all* objects contained in the `prefixes` you specify.
+
+> You can view the payload of individual `index_parallel` tasks to see how 
Druid has divided up the work of ingestion.
+
+> The S3 input source will skip all empty objects only when `prefixes` is 
specified.
+
+#### S3 Input Objects
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`bucket`|Name of the S3 bucket|None|yes|
+|`path`|The path where data is located.|None|yes|
+
+#### S3 Input Properties Object
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`accessKeyId`|The [Password Provider](../operations/password-provider.md) or 
plain text string of this S3 InputSource's access key|None|yes if 
secretAccessKey is given|

Review comment:
       ```suggestion
   |`accessKeyId`|The [Password Provider](../operations/password-provider.md) 
or plain text string of the S3 InputSource's access key|None|yes if 
secretAccessKey is given|
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -864,6 +901,8 @@ Sample specs:
 ...
 ```
 
+This specification will ingest all the objects in two locations given in 
`prefixes`:

Review comment:
       ```suggestion
   The following specification ingests all the objects in two locations given 
in `prefixes`:
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -837,16 +837,53 @@ Only the native Parallel task and Simple task support the 
input source.
 
 ### S3 Input Source
 
-> You need to include the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension 
to use the S3 input source. 
+Use the *S3 input source* to read objects directly from S3-like storage.
 
-The S3 input source is to support reading objects directly from S3.
-Objects can be specified either via a list of S3 URI strings or a list of
-S3 location prefixes, which will attempt to list the contents and ingest
-all objects contained in the locations. The S3 input source is splittable
-and can be used by the [Parallel task](#parallel-task),
-where each worker task of `index_parallel` will read one or multiple objects.
+> To ingest from S3-type storage, you need to 
[load](../development/extensions.html#loading-extensions) the 
[`druid-s3-extensions`](../development/extensions-core/s3.md) extension.
 
-Sample specs:
+> The S3 input source is splittable, meaning it can be used by the [Parallel 
task](#parallel-task).  Each `index_parallel` task will then read one or 
multiple objects.
+
+Objects to ingest can be specified as:
+
+- a list of S3 URI strings or
+- a list of S3 location prefixes
+
+|property|description|default|required?|
+|--------|-----------|-------|---------|
+|`type`|This must be `s3`.|None|yes|
+|`uris`|JSON array of URIs where S3 objects to be ingested are 
located.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|`prefixes`|JSON array of URI prefixes for the locations of S3 objects to be 
ingested. Empty objects starting with one of the given prefixes will be 
skipped.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`objects`](#s3-input-objects)|JSON array of S3 Objects to be 
ingested.|None|`uris` or `prefixes` or [`objects`](#s3-input-objects) must be 
set|
+|[`properties`](#s3-input-properties-object)|Properties Object for overriding 
the default S3 configuration.|None|No (defaults will be used if not given)
+
+> When you supply a list of `prefixes`, Druid will list the contents and then 
ingest

Review comment:
       ```suggestion
   When you supply a list of `prefixes`, Druid lists the contents and then 
ingests
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] techdocsmith commented on a change in pull request #11490: Docs - S3 masking and nav update to S3 page

Reply via email to