techdocsmith commented on a change in pull request #10830:
URL: https://github.com/apache/druid/pull/10830#discussion_r573989831
##########
File path: docs/configuration/index.md
##########
@@ -515,6 +515,36 @@ This deep storage is used to interface with Cassandra.
Note that the `druid-cas
|`druid.storage.keyspace`|Cassandra key space.|none|
+### Ingestion Security Configuration
+
+#### HDFS input source
+
+You can set the following property to control what protocols are allowed for
+the [HDFS input source](../ingestion/native-batch.md#hdfs-input-source) and
the [HDFS firehose](../ingestion/native-batch.md#hdfsfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols
that HDFS input source and HDFS firehose can use.|["hdfs"]|
Review comment:
```suggestion
|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols
for the HDFS input source and HDFS firehose.|["hdfs"]|
```
##########
File path: docs/configuration/index.md
##########
@@ -515,6 +515,36 @@ This deep storage is used to interface with Cassandra.
Note that the `druid-cas
|`druid.storage.keyspace`|Cassandra key space.|none|
+### Ingestion Security Configuration
+
+#### HDFS input source
+
+You can set the following property to control what protocols are allowed for
Review comment:
```suggestion
You can set the following property to specify permissible protocols for
```
##########
File path: docs/configuration/index.md
##########
@@ -515,6 +515,36 @@ This deep storage is used to interface with Cassandra.
Note that the `druid-cas
|`druid.storage.keyspace`|Cassandra key space.|none|
+### Ingestion Security Configuration
+
+#### HDFS input source
+
+You can set the following property to control what protocols are allowed for
+the [HDFS input source](../ingestion/native-batch.md#hdfs-input-source) and
the [HDFS firehose](../ingestion/native-batch.md#hdfsfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols
that HDFS input source and HDFS firehose can use.|["hdfs"]|
+
+
+#### HTTP input source
+
+You can set the following property to control what protocols are allowed for
+the [HTTP input source](../ingestion/native-batch.md#http-input-source) and
the [HTTP firehose](../ingestion/native-batch.md#httpfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.http.allowedProtocols`|List of protocols|Allowed protocols
that HTTP input source and HTTP firehose can use.|["http", "https"]|
+
+The following properties are to control what domains native batch tasks can
access to using
Review comment:
```suggestion
The following properties control the domains native batch tasks can access
using
```
##########
File path: docs/configuration/index.md
##########
@@ -515,6 +515,36 @@ This deep storage is used to interface with Cassandra.
Note that the `druid-cas
|`druid.storage.keyspace`|Cassandra key space.|none|
+### Ingestion Security Configuration
+
+#### HDFS input source
+
+You can set the following property to control what protocols are allowed for
+the [HDFS input source](../ingestion/native-batch.md#hdfs-input-source) and
the [HDFS firehose](../ingestion/native-batch.md#hdfsfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols
that HDFS input source and HDFS firehose can use.|["hdfs"]|
+
+
+#### HTTP input source
+
+You can set the following property to control what protocols are allowed for
+the [HTTP input source](../ingestion/native-batch.md#http-input-source) and
the [HTTP firehose](../ingestion/native-batch.md#httpfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.http.allowedProtocols`|List of protocols|Allowed protocols
that HTTP input source and HTTP firehose can use.|["http", "https"]|
+
+The following properties are to control what domains native batch tasks can
access to using
+the [HTTP input source](../ingestion/native-batch.md#http-input-source).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.http.allowListDomains`|List of domains|Allowed domains from
which ingestion will be allowed. Only one of allowList or denyList can be
set.|empty list|
Review comment:
```suggestion
|`druid.ingestion.http.allowListDomains`|List of domains|Domains allowed to
use as an ingestion source. You cannot use both `allowList` and
`denyList`.|empty list|
```
##########
File path: docs/configuration/index.md
##########
@@ -515,6 +515,36 @@ This deep storage is used to interface with Cassandra.
Note that the `druid-cas
|`druid.storage.keyspace`|Cassandra key space.|none|
+### Ingestion Security Configuration
+
+#### HDFS input source
+
+You can set the following property to control what protocols are allowed for
+the [HDFS input source](../ingestion/native-batch.md#hdfs-input-source) and
the [HDFS firehose](../ingestion/native-batch.md#hdfsfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols
that HDFS input source and HDFS firehose can use.|["hdfs"]|
+
+
+#### HTTP input source
+
+You can set the following property to control what protocols are allowed for
Review comment:
```suggestion
You can set the following property to specify permissible protocols for
```
##########
File path: docs/ingestion/native-batch.md
##########
@@ -1127,9 +1127,10 @@ Sample specs:
|type|This should be `hdfs`.|None|yes|
|paths|HDFS paths. Can be either a JSON array or comma-separated string of
paths. Wildcards like `*` are supported in these paths. Empty files located
under one of the given paths will be skipped.|None|yes|
-You can also ingest from cloud storage using the HDFS input source.
-However, if you want to read from AWS S3 or Google Cloud Storage, consider
using
-the [S3 input source](#s3-input-source) or the [Google Cloud Storage input
source](#google-cloud-storage-input-source) instead.
+You can also ingest from other storage using the HDFS input source if the HDFS
client supports that storage.
+However, if you want to ingest from cloud storage, consider using the proper
input sources for them.
+If you want to use a non-hdfs protocol with the HDFS input source, you need to
include the protocol you want
Review comment:
```suggestion
If you want to use a non-hdfs protocol with the HDFS input source, include
the protocol
```
##########
File path: docs/ingestion/native-batch.md
##########
@@ -1553,6 +1557,11 @@ Note that prefetching or caching isn't that useful in
the Parallel task.
|fetchTimeout|Timeout for fetching each file.|60000|
|maxFetchRetry|Maximum number of retries for fetching each file.|3|
+You can also ingest from other storage using the HDFS firehose if the HDFS
client supports that storage.
+However, if you want to ingest from cloud storage, consider using the proper
input sources for them.
Review comment:
```suggestion
However, if you want to ingest from cloud storage, consider using the
service-specific input source for your cloud storage.
```
##########
File path: docs/configuration/index.md
##########
@@ -515,6 +515,36 @@ This deep storage is used to interface with Cassandra.
Note that the `druid-cas
|`druid.storage.keyspace`|Cassandra key space.|none|
+### Ingestion Security Configuration
+
+#### HDFS input source
+
+You can set the following property to control what protocols are allowed for
+the [HDFS input source](../ingestion/native-batch.md#hdfs-input-source) and
the [HDFS firehose](../ingestion/native-batch.md#hdfsfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols
that HDFS input source and HDFS firehose can use.|["hdfs"]|
+
+
+#### HTTP input source
+
+You can set the following property to control what protocols are allowed for
+the [HTTP input source](../ingestion/native-batch.md#http-input-source) and
the [HTTP firehose](../ingestion/native-batch.md#httpfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.http.allowedProtocols`|List of protocols|Allowed protocols
that HTTP input source and HTTP firehose can use.|["http", "https"]|
+
+The following properties are to control what domains native batch tasks can
access to using
+the [HTTP input source](../ingestion/native-batch.md#http-input-source).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.http.allowListDomains`|List of domains|Allowed domains from
which ingestion will be allowed. Only one of allowList or denyList can be
set.|empty list|
+|`druid.ingestion.http.denyListDomains`|List of domains|Blacklisted domains
from which ingestion will NOT be allowed. Only one of allowList or denyList can
be set. |empty list|
Review comment:
```suggestion
|`druid.ingestion.http.denyListDomains`|List of domains|Domains not be
allowed for use as an ingestion source. You cannot use both `denyList` and
`allowList`. |empty list|
```
##########
File path: docs/ingestion/native-batch.md
##########
@@ -1590,6 +1599,9 @@ A sample HTTP Firehose spec is shown below:
}
```
+The protocols that the HTTP firehose can use is restricted by
`druid.ingestion.http.allowedProtocols`.
Review comment:
```suggestion
You can only use protocols listed in the
`druid.ingestion.http.allowedProtocols` property as HTTP firehose input sources.
```
##########
File path: docs/configuration/index.md
##########
@@ -515,6 +515,36 @@ This deep storage is used to interface with Cassandra.
Note that the `druid-cas
|`druid.storage.keyspace`|Cassandra key space.|none|
+### Ingestion Security Configuration
+
+#### HDFS input source
+
+You can set the following property to control what protocols are allowed for
+the [HDFS input source](../ingestion/native-batch.md#hdfs-input-source) and
the [HDFS firehose](../ingestion/native-batch.md#hdfsfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols
that HDFS input source and HDFS firehose can use.|["hdfs"]|
+
+
+#### HTTP input source
+
+You can set the following property to control what protocols are allowed for
+the [HTTP input source](../ingestion/native-batch.md#http-input-source) and
the [HTTP firehose](../ingestion/native-batch.md#httpfirehose).
+
+|Property|Possible Values|Description|Default|
+|--------|---------------|-----------|-------|
+|`druid.ingestion.http.allowedProtocols`|List of protocols|Allowed protocols
that HTTP input source and HTTP firehose can use.|["http", "https"]|
Review comment:
```suggestion
|`druid.ingestion.http.allowedProtocols`|List of protocols|Allowed protocols
for the HTTP input source and HTTP firehose.|["http", "https"]|
```
##########
File path: docs/ingestion/native-batch.md
##########
@@ -1203,10 +1204,13 @@ You can also use the other existing Druid
PasswordProviders. Here is an example
|property|description|default|required?|
|--------|-----------|-------|---------|
|type|This should be `http`|None|yes|
-|uris|URIs of the input files.|None|yes|
+|uris|URIs of the input files. See below for the protocols allowed for
URIs.|None|yes|
|httpAuthenticationUsername|Username to use for authentication with specified
URIs. Can be optionally used if the URIs specified in the spec require a Basic
Authentication Header.|None|no|
|httpAuthenticationPassword|PasswordProvider to use with specified URIs. Can
be optionally used if the URIs specified in the spec require a Basic
Authentication Header.|None|no|
+The protocols that the HTTP input source can use is restricted by
`druid.ingestion.http.allowedProtocols`.
Review comment:
```suggestion
You can only use protocols listed in the
`druid.ingestion.http.allowedProtocols` property as HTTP input sources.
```
##########
File path: docs/ingestion/native-batch.md
##########
@@ -1127,9 +1127,10 @@ Sample specs:
|type|This should be `hdfs`.|None|yes|
|paths|HDFS paths. Can be either a JSON array or comma-separated string of
paths. Wildcards like `*` are supported in these paths. Empty files located
under one of the given paths will be skipped.|None|yes|
-You can also ingest from cloud storage using the HDFS input source.
-However, if you want to read from AWS S3 or Google Cloud Storage, consider
using
-the [S3 input source](#s3-input-source) or the [Google Cloud Storage input
source](#google-cloud-storage-input-source) instead.
+You can also ingest from other storage using the HDFS input source if the HDFS
client supports that storage.
+However, if you want to ingest from cloud storage, consider using the proper
input sources for them.
Review comment:
```suggestion
However, if you want to ingest from cloud storage, consider using the
service-specific input source for your cloud storage.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]