[GitHub] [druid] techdocsmith commented on a change in pull request #11153: Azure data lake input source

GitBox Fri, 23 Apr 2021 09:25:51 -0700


techdocsmith commented on a change in pull request #11153:
URL: https://github.com/apache/druid/pull/11153#discussion_r619340293




##########
File path: docs/ingestion/native-batch.md
##########
@@ -1004,10 +1004,8 @@ Google Cloud Storage object:
 
 > You need to include the 
 > [`druid-azure-extensions`](../development/extensions-core/azure.md) as an 
 > extension to use the Azure input source.
 
-The Azure input source is to support reading objects directly from Azure Blob 
store. Objects can be
-specified as list of Azure Blob store URI strings. The Azure input source is 
splittable and can be used
-by the [Parallel task](#parallel-task), where each worker task of 
`index_parallel` will read
-a single object.
+The Azure input source is used to read objects directly from Azure Blob store 
or Azure Data Lake sources. Objects can be

Review comment:
       ```suggestion
   The Azure input source reads objects directly from Azure Blob store or Azure 
Data Lake sources. You can
   ```
   nit

##########
File path: docs/ingestion/native-batch.md
##########
@@ -1004,10 +1004,8 @@ Google Cloud Storage object:
 
 > You need to include the 
 > [`druid-azure-extensions`](../development/extensions-core/azure.md) as an 
 > extension to use the Azure input source.
 
-The Azure input source is to support reading objects directly from Azure Blob 
store. Objects can be
-specified as list of Azure Blob store URI strings. The Azure input source is 
splittable and can be used
-by the [Parallel task](#parallel-task), where each worker task of 
`index_parallel` will read
-a single object.
+The Azure input source is used to read objects directly from Azure Blob store 
or Azure Data Lake sources. Objects can be
+specified as a list of file URI strings or prefixes. The Azure input source is 
splittable and can be used by the [Parallel task](#parallel-task), where each 
worker task reads a single object.

Review comment:
       ```suggestion
   specify objects as a list of file URI strings or prefixes. You can split the 
Azure input source for use with [Parallel task](#parallel-task) indexing and 
each worker task reads one chunk of the split data.
   ```
   I think we should differentiate between the `single object` and the sections 
of split out object since we're using `object` as the whole.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] techdocsmith commented on a change in pull request #11153: Azure data lake input source

Reply via email to