Hi Oleksii,

Thanks for the suggestion. It seems that this is intentionally not supported by 
Azure due to the way they store chunked 
blobs<https://stackoverflow.com/questions/42229153/how-to-check-azure-storage-blob-file-uploaded-correctly>.
 In case of uploads of files larger than 256 MB, they get uploaded in chunks 
and a transactional MD5 checksum is computed for each one automatically.


The x-ms-blob-content-md5 can be used to set the blob's Content-MD5 property 
but that won't be verified on the server side. Instead, Azure 
documentation<https://technet2.github.io/Wiki/blogs/windowsazurestorage/windows-azure-blob-md5-overview.html>
 suggests using HTTPS as a more secure alternative relying on the transactional 
data integrity it provides.

Best Regards,
Lehel



________________________________
From: Oleksii Zhurko (Contractor) <oleksii.zhu...@merative.com>
Sent: Thursday, October 26, 2023 18:58
To: dev@nifi.apache.org <dev@nifi.apache.org>
Subject: MD5 support for files larger than 256MB in PutAzureBlobStorage_v12

Dear Team,

I am writing to address a potential enhancement in the PutAzureBlobStorage_v12 
processor of NiFi.

Deprecated PutAzureBlobStorage processor supports generating MD5 using 
com.microsoft.azure:azure-storage-blob dependency but does not support the 
Service Principal. During the creation of PutAzureBlobStorage_v12 processor, 
the previous dependency was replaced with com.azure:azure-storage-blob that 
does not support the generation of MD5 for content exceeding 256MB. This 
behavior has posed some challenges in data verification and integrity checks.

I would kindly suggest supporting larger files as well. There is a conceptual 
note that could possibly be used to achieve the goal. The 
com.azure:azure-storage-blob dependency allows for manual setting of the MD5 by 
invoking the setHttpHeaders method on the BlobClient, passing it 
BlobHttpHeaders with a contentMd5 value generated, for instance, via 
MessageDigest. The logic with setHttpHeaders should be triggered in case where 
the BlockBlobItem has its contentMd5 set as null.

Lastly, if you decide to implement this enhancement, could you provide any 
insights regarding its possible inclusion in a future release?

Thank you for your time and consideration. Looking forward to your positive 
response.


--
Best regards,
Oleksii

Reply via email to