This is an automated email from the ASF dual-hosted git repository.
shetland pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new fd0931d Azure data lake input source (#11153)
fd0931d is described below
commit fd0931d35ee42223669af790170336a4309a9b3c
Author: sthetland <[email protected]>
AuthorDate: Fri Jun 25 15:54:34 2021 -0700
Azure data lake input source (#11153)
* Mention Azure Data Lake
* Make consistent with other entries
Co-authored-by: Charles Smith
<[email protected]>
---
docs/ingestion/native-batch.md | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md
index e445b4e..df7a097 100644
--- a/docs/ingestion/native-batch.md
+++ b/docs/ingestion/native-batch.md
@@ -58,7 +58,7 @@ The supported splittable input formats for now are:
- [`s3`](#s3-input-source) reads data from AWS S3 storage.
- [`gs`](#google-cloud-storage-input-source) reads data from Google Cloud
Storage.
-- [`azure`](#azure-input-source) reads data from Azure Blob Storage.
+- [`azure`](#azure-input-source) reads data from Azure Blob Storage and Azure
Data Lake.
- [`hdfs`](#hdfs-input-source) reads data from HDFS storage.
- [`http`](#http-input-source) reads data from HTTP servers.
- [`local`](#local-input-source) reads data from local storage.
@@ -1046,10 +1046,8 @@ Google Cloud Storage object:
> You need to include the
> [`druid-azure-extensions`](../development/extensions-core/azure.md) as an
> extension to use the Azure input source.
-The Azure input source is to support reading objects directly from Azure Blob
store. Objects can be
-specified as list of Azure Blob store URI strings. The Azure input source is
splittable and can be used
-by the [Parallel task](#parallel-task), where each worker task of
`index_parallel` will read
-a single object.
+The Azure input source reads objects directly from Azure Blob store or Azure
Data Lake sources. You can
+specify objects as a list of file URI strings or prefixes. You can split the
Azure input source for use with [Parallel task](#parallel-task) indexing and
each worker task reads one chunk of the split data.
Sample specs:
@@ -1108,17 +1106,17 @@ Sample specs:
|property|description|default|required?|
|--------|-----------|-------|---------|
|type|This should be `azure`.|None|yes|
-|uris|JSON array of URIs where Azure Blob objects to be ingested are located.
Should be in form "azure://\<container>/\<path-to-file\>"|None|`uris` or
`prefixes` or `objects` must be set|
-|prefixes|JSON array of URI prefixes for the locations of Azure Blob objects
to be ingested. Should be in the form "azure://\<container>/\<prefix\>". Empty
objects starting with one of the given prefixes will be skipped.|None|`uris` or
`prefixes` or `objects` must be set|
-|objects|JSON array of Azure Blob objects to be ingested.|None|`uris` or
`prefixes` or `objects` must be set|
+|uris|JSON array of URIs where the Azure objects to be ingested are located,
in the form "azure://\<container>/\<path-to-file\>"|None|`uris` or `prefixes`
or `objects` must be set|
+|prefixes|JSON array of URI prefixes for the locations of Azure objects to
ingest, in the form "azure://\<container>/\<prefix\>". Empty objects starting
with one of the given prefixes are skipped.|None|`uris` or `prefixes` or
`objects` must be set|
+|objects|JSON array of Azure objects to ingest.|None|`uris` or `prefixes` or
`objects` must be set|
-Note that the Azure input source will skip all empty objects only when
`prefixes` is specified.
+Note that the Azure input source skips all empty objects only when `prefixes`
is specified.
-Azure Blob object:
+The `objects` property is:
|property|description|default|required?|
|--------|-----------|-------|---------|
-|bucket|Name of the Azure Blob Storage container|None|yes|
+|bucket|Name of the Azure Blob Storage or Azure Data Lake container|None|yes|
|path|The path where data is located.|None|yes|
### HDFS Input Source
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]