[
https://issues.apache.org/jira/browse/NIFI-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zenfenan updated NIFI-4826:
---------------------------
Description:
ListAzureBlobStorage as of now takes the substring from the blob's primary URI
i.e. primaryUri.lastIndexOf('/') + 1 and writes that as azure.blobname. For ex,
if the blob is in the path
"mystorageaccountname.blob.core.windows.net/container-name/path/to/the/blob".
It will write azure.blobname as "blob". So if we have the blob located under a
multiple hierarchy directory structure such as the above one, it will be
troublesome in the downstream processors like FetchAzureBlobStorage which
expects the full blob name to be given i.e. "path/to/the/blob". Giving just
"blob" here will fail.
A workaround that can be followed right now, is to use "ExecuteScript" and get
the substring from primary URI i.e. everything after the
"https://"+storageAccountName+"/"+containerName+"/". A better approach would be
to make use of the CloudBlob.getName() API provided in Azure SDK. It should be
a minor change since we are already using this SDK and the said class in our
processor.
was:
ListAzureBlobStorage as of now takes the substring from the blob's primary URI
i.e. primaryUri.lastIndexOf('/') + 1 and writes that as azure.blobname. For ex,
if the blob is in the path
"mystorageaccountname.blob.core.windows.net/container-name/path/to/the/blob".
It will write azure.blobname as "blob". So if we have the blob located under a
multiple hierarchy directory structure such as the above one, it will be
troublesome in the downstream processors like FetchAzureBlobStorage which
expects the full blob name to be given i.e. "path/to/the/blob". Giving just
"blob" here will fail.
A workaround that can be followed right now, is to use "ExecuteScript" and get
the substring from primary URI i.e. everything after the
"https://"+storageAccountName+"/"+containerName+"/". A better approach would be
to make use of the CloudBlob.getName() API provided in Azure SDK. It should be
a minor change since we are already using this SDK and the class in our
processor.
> ListAzureBlobStorage doesn't write azure.blobname properly
> ----------------------------------------------------------
>
> Key: NIFI-4826
> URL: https://issues.apache.org/jira/browse/NIFI-4826
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
> Reporter: zenfenan
> Priority: Minor
>
> ListAzureBlobStorage as of now takes the substring from the blob's primary
> URI i.e. primaryUri.lastIndexOf('/') + 1 and writes that as azure.blobname.
> For ex, if the blob is in the path
> "mystorageaccountname.blob.core.windows.net/container-name/path/to/the/blob".
> It will write azure.blobname as "blob". So if we have the blob located under
> a multiple hierarchy directory structure such as the above one, it will be
> troublesome in the downstream processors like FetchAzureBlobStorage which
> expects the full blob name to be given i.e. "path/to/the/blob". Giving just
> "blob" here will fail.
> A workaround that can be followed right now, is to use "ExecuteScript" and
> get the substring from primary URI i.e. everything after the
> "https://"+storageAccountName+"/"+containerName+"/". A better approach would
> be to make use of the CloudBlob.getName() API provided in Azure SDK. It
> should be a minor change since we are already using this SDK and the said
> class in our processor.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)