ASF GitHub Bot commented on NIFI-4826:

Github user zenfenan commented on a diff in the pull request:

    --- Diff: 
    @@ -106,7 +106,8 @@
             attributes.put("azure.etag", entity.getEtag());
             attributes.put("azure.primaryUri", entity.getPrimaryUri());
             attributes.put("azure.secondaryUri", entity.getSecondaryUri());
    -        attributes.put("azure.blobname", entity.getName());
    +        attributes.put("azure.blobname", entity.getBlobName());
    +        attributes.put("filename", entity.getName());
    --- End diff --
    I dint use `filename` just because to find a use for `entity.getName()` The 
filename attribute for ListAzureBlobStorage as of 1.5.0 has a large number as 
its name, much like the filenames of flowfiles generated by GenerateFlowFiles 
processor. The change here is intended to provide a meaningful filename i.e. 
the actual name of the blob.
    My assumption was that people would either be updating the filename using 
UpdateAttribute/ExecuteScript (parsing the azure.primaryUri and taking the file 
name) or simply ignoring the filename altogether. In that case, this change 
won't affect, right? For ex: If people are overwriting filename attribute, 
whatever this contribution produces as a filename will still be overwritten ( I 
am again going with the assumption that they would either be parsing the 
filename from the azure.primaryUri attribute since that's the only attribute 
that provides us the blob name). If they aren't touching the filename 
attribute, then this would just produce a meaningful filename. Hope you 
understood what I'm trying to say.

> ListAzureBlobStorage doesn't write azure.blobname properly
> ----------------------------------------------------------
>                 Key: NIFI-4826
>                 URL: https://issues.apache.org/jira/browse/NIFI-4826
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>            Reporter: zenfenan
>            Priority: Minor
> ListAzureBlobStorage as of now takes the substring from the blob's primary 
> URI i.e. primaryUri.lastIndexOf('/') + 1 and writes that as azure.blobname. 
> For ex, if the blob is in the path 
> "mystorageaccountname.blob.core.windows.net/container-name/path/to/the/blob". 
> It will write azure.blobname as "blob". So if we have the blob located under 
> a multiple hierarchy directory structure such as the above one, it will be 
> troublesome in the downstream processors like FetchAzureBlobStorage which 
> expects the full blob name to be given i.e. "path/to/the/blob". Giving just 
> "blob" here will fail.
> A workaround that can be followed right now, is to use "ExecuteScript" and 
> get the substring from primary URI i.e. everything after the 
> "https://"+storageAccountName+"/"+containerName+"/";. A better approach would 
> be to make use of the CloudBlob.getName() API provided in Azure SDK. It 
> should be a minor change since we are already using this SDK and the said 
> class in our processor.

This message was sent by Atlassian JIRA

Reply via email to