[druid] branch master updated: Azure data lake input source (#11153)

shetland Fri, 25 Jun 2021 15:55:07 -0700

This is an automated email from the ASF dual-hosted git repository.

shetland pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new fd0931d  Azure data lake input source (#11153)
fd0931d is described below

commit fd0931d35ee42223669af790170336a4309a9b3c
Author: sthetland <[email protected]>
AuthorDate: Fri Jun 25 15:54:34 2021 -0700

    Azure data lake input source (#11153)
    
    * Mention Azure Data Lake
    
    * Make consistent with other entries
    
    Co-authored-by: Charles Smith 
<[email protected]>
---
 docs/ingestion/native-batch.md | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md
index e445b4e..df7a097 100644
--- a/docs/ingestion/native-batch.md
+++ b/docs/ingestion/native-batch.md
@@ -58,7 +58,7 @@ The supported splittable input formats for now are:
 
 - [`s3`](#s3-input-source) reads data from AWS S3 storage.
 - [`gs`](#google-cloud-storage-input-source) reads data from Google Cloud 
Storage.
-- [`azure`](#azure-input-source) reads data from Azure Blob Storage.
+- [`azure`](#azure-input-source) reads data from Azure Blob Storage and Azure 
Data Lake.
 - [`hdfs`](#hdfs-input-source) reads data from HDFS storage.
 - [`http`](#http-input-source) reads data from HTTP servers.
 - [`local`](#local-input-source) reads data from local storage.
@@ -1046,10 +1046,8 @@ Google Cloud Storage object:
 
 > You need to include the 
 > [`druid-azure-extensions`](../development/extensions-core/azure.md) as an 
 > extension to use the Azure input source.
 
-The Azure input source is to support reading objects directly from Azure Blob 
store. Objects can be
-specified as list of Azure Blob store URI strings. The Azure input source is 
splittable and can be used
-by the [Parallel task](#parallel-task), where each worker task of 
`index_parallel` will read
-a single object.
+The Azure input source reads objects directly from Azure Blob store or Azure 
Data Lake sources. You can
+specify objects as a list of file URI strings or prefixes. You can split the 
Azure input source for use with [Parallel task](#parallel-task) indexing and 
each worker task reads one chunk of the split data.
 
 Sample specs:
 
@@ -1108,17 +1106,17 @@ Sample specs:
 |property|description|default|required?|
 |--------|-----------|-------|---------|
 |type|This should be `azure`.|None|yes|
-|uris|JSON array of URIs where Azure Blob objects to be ingested are located. 
Should be in form "azure://\<container>/\<path-to-file\>"|None|`uris` or 
`prefixes` or `objects` must be set|
-|prefixes|JSON array of URI prefixes for the locations of Azure Blob objects 
to be ingested. Should be in the form "azure://\<container>/\<prefix\>". Empty 
objects starting with one of the given prefixes will be skipped.|None|`uris` or 
`prefixes` or `objects` must be set|
-|objects|JSON array of Azure Blob objects to be ingested.|None|`uris` or 
`prefixes` or `objects` must be set|
+|uris|JSON array of URIs where the Azure objects to be ingested are located, 
in the form "azure://\<container>/\<path-to-file\>"|None|`uris` or `prefixes` 
or `objects` must be set|
+|prefixes|JSON array of URI prefixes for the locations of Azure objects to 
ingest, in the form "azure://\<container>/\<prefix\>". Empty objects starting 
with one of the given prefixes are skipped.|None|`uris` or `prefixes` or 
`objects` must be set|
+|objects|JSON array of Azure objects to ingest.|None|`uris` or `prefixes` or 
`objects` must be set|
 
-Note that the Azure input source will skip all empty objects only when 
`prefixes` is specified.
+Note that the Azure input source skips all empty objects only when `prefixes` 
is specified.
 
-Azure Blob object:
+The `objects` property is:
 
 |property|description|default|required?|
 |--------|-----------|-------|---------|
-|bucket|Name of the Azure Blob Storage container|None|yes|
+|bucket|Name of the Azure Blob Storage or Azure Data Lake container|None|yes|
 |path|The path where data is located.|None|yes|
 
 ### HDFS Input Source

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch master updated: Azure data lake input source (#11153)

Reply via email to