zachjsh commented on code in PR #15630:
URL: https://github.com/apache/druid/pull/15630#discussion_r1454077366


##########
docs/ingestion/input-sources.md:
##########
@@ -384,6 +384,103 @@ The `objects` property is:
 |bucket|Name of the Azure Blob Storage or Azure Data Lake container|None|yes|
 |path|The path where data is located.|None|yes|
 
+
+### Ingesting from multiple Azure storage accounts
+To ingest from a storage account other than the one configured in 
`druid.azure.account`, use the `azureStorage` schema instead of the `azure` one.
+
+Sample specs:
+
+```json
+...
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "azureStorage",
+        "objectGlob": "**.json",
+        "uris": ["azureStorage://storageAccount/container/prefix1/file.json", 
"azureStorage://storageAccount/container/prefix2/file2.json"]
+      },
+      "inputFormat": {
+        "type": "json"
+      },
+      ...
+    },
+...
+```
+
+```json
+...
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "azureStorage",
+        "objectGlob": "**.parquet",
+        "prefixes": ["azureStorage://storageAccount/container/prefix1/", 
"azureStorage://storageAccount/container/prefix2/"]
+      },
+      "inputFormat": {
+        "type": "json"
+      },
+      ...
+    },
+...
+```
+
+
+```json
+...
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "azureStorage",
+        "objectGlob": "**.json",
+        "objects": [
+          { "bucket": "storageAccount", "path": 
"container/prefix1/file1.json"},
+          { "bucket": "storageAccount", "path": "container/prefix2/file2.json"}
+        ],
+        "properties": {
+          "sharedAccessStorageToken": "?sv=...<storage token secret>...",
+        }
+      },
+      "inputFormat": {
+        "type": "json"
+      },
+      ...
+    },
+...
+```
+
+|Property|Description|Default|Required|
+|--------|-----------|-------|---------|
+|type|Set the value to `azureStorage`.|None|yes|
+|uris|JSON array of URIs where the Azure objects to be ingested are located, 
in the form 
`azureStorage://<storageAccount>/<container>/<path-to-file>`|None|`uris` or 
`prefixes` or `objects` must be set|
+|prefixes|JSON array of URI prefixes for the locations of Azure objects to 
ingest, in the form `azureStorage://<storageAccount>/<container>/<prefix>`. 
Empty objects starting with one of the given prefixes are skipped.|None|`uris` 
or `prefixes` or `objects` must be set|
+|objects|JSON array of Azure objects to ingest.|None|`uris` or `prefixes` or 
`objects` must be set|
+|objectGlob|A glob for the object part of the Azure URI. In the URI 
`azureStorage://foo/bar/file.json`, the glob is applied to `bar/file.json`.<br 
/><br />The glob must match the entire object part, not just the filename. For 
example, the glob `*.json` does not match `azureStorage://foo/bar/file.json`, 
because the object part is `bar/file.json`, and the`*` does not match the 
slash. To match all objects ending in `.json`, use `**.json` instead.<br /><br 
/>For more information, refer to the documentation for 
[`FileSystem#getPathMatcher`](https://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystem.html#getPathMatcher-java.lang.String-).|None|no|

Review Comment:
   is the regex behavior noted here similar for S3 or different here for azure 
for some reason?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to