[GitHub] [nifi] turcsanyip commented on a change in pull request #4556: NIFI-7830: Support large files in PutAzureDataLakeStorage

GitBox Fri, 25 Sep 2020 15:57:44 -0700


turcsanyip commented on a change in pull request #4556:
URL: https://github.com/apache/nifi/pull/4556#discussion_r495299416




##########
File path: 
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java
##########
@@ -120,11 +122,29 @@ public void onTrigger(final ProcessContext context, final 
ProcessSession session
 
                 final long length = flowFile.getSize();
                 if (length > 0) {
-                    try (final InputStream rawIn = session.read(flowFile); 
final BufferedInputStream in = new BufferedInputStream(rawIn)) {
-                        fileClient.append(in, 0, length);
+                    long chunkStart = 0;
+                    long chunkSize;
+
+                    try (final InputStream rawIn = session.read(flowFile);
+                         final BufferedInputStream in = new 
BufferedInputStream(rawIn) {
+                             @Override
+                             public int available() {
+                                 // 
com.azure.storage.common.Utility.convertStreamToByteBuffer() throws an exception
+                                 // if there are more available bytes in the 
stream after reading the chunk
+                                 return 0;

Review comment:
       @MuazmaZ Do you happen to know why `Utility.convertStreamToByteBuffer()` 
throws an exception when `available() > 0`?
   
https://github.com/Azure/azure-sdk-for-java/blob/0345889402425191b7003e73b7b3d6ea3c0a5175/sdk/storage/azure-storage-common/src/main/java/com/azure/storage/common/Utility.java#L268
   
   Due to this, it is not possible to process a longer input stream in portions 
/ chunks.
   As a workaround, I added a fake `available()` method to lie there is no more 
data in the input stream which is not really nice but works.
   Another option would be to read the chunks in a loop into a byte array on 
our side and pass a stream on the byte array to the Azure client lib. But I 
would rather avoid this extra copy and extra memory for the buffer.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [nifi] turcsanyip commented on a change in pull request #4556: NIFI-7830: Support large files in PutAzureDataLakeStorage

Reply via email to