jfrazee commented on a change in pull request #5098:
URL: https://github.com/apache/nifi/pull/5098#discussion_r645674083



##########
File path: 
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java
##########
@@ -84,12 +89,24 @@
             .allowableValues(FAIL_RESOLUTION, REPLACE_RESOLUTION, 
IGNORE_RESOLUTION)
             .build();
 
+    public static final PropertyDescriptor TEMP_FILE_PREFIX = new 
PropertyDescriptor.Builder()

Review comment:
       I think it's important to default this to being enabled if not required 
but with a choice of the location to write to.
   
   Not using a temp file opens up a risk for data corruption and there 
shouldn't be a noticeable performance penalty so I don't think there's a 
compelling case for keeping the previous behavior. Moreover, one of the goals 
of ADLS Gen2 was specifically to allow efficient write then rename so this 
improvement is for the most part the "right thing" to do.
   
   I don't think we can change it to `required(true)` but we can treat a null 
`getValue()` as enabled so there shouldn't be an upgrade issue.

##########
File path: 
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/test/java/org/apache/nifi/processors/azure/storage/ITPutAzureDataLakeStorage.java
##########
@@ -150,6 +240,21 @@ public void testPutBigFile() throws Exception {
         assertSuccess(DIRECTORY, FILE_NAME, fileData);
     }
 
+    @Ignore
+    // ignore excessive test with larger file size

Review comment:
       There was a recent issue that would have been caught with big file tests 
and since this is an IT, I think it's fair to not ignore. At this size it runs 
quick anyway.
   ```suggestion
   ```

##########
File path: 
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java
##########
@@ -192,4 +211,29 @@ static void uploadContent(DataLakeFileClient fileClient, 
InputStream in, long le
 
         fileClient.flush(length);
     }
+
+    private DataLakeFileClient renameFile(String fileName, String 
directoryPath, DataLakeFileClient fileClient, boolean overwrite) {
+        try {
+            final DataLakeRequestConditions destinationCondition = new 
DataLakeRequestConditions();
+            if (!overwrite) {
+                destinationCondition.setIfNoneMatch("*");
+            }
+            String destinationPath = StringUtils.isNotEmpty(directoryPath)

Review comment:
       ```suggestion
               String destinationPath = StringUtils.isNotBlank(directoryPath)
   ```

##########
File path: 
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java
##########
@@ -84,12 +89,24 @@
             .allowableValues(FAIL_RESOLUTION, REPLACE_RESOLUTION, 
IGNORE_RESOLUTION)
             .build();
 
+    public static final PropertyDescriptor TEMP_FILE_PREFIX = new 
PropertyDescriptor.Builder()

Review comment:
       I haven't tested yet, but it looks like the validator will prevent using 
directories for this?
   
   There are often downstream users -- Hadoop, Spark, etc. -- that I believe 
will assume temp files are going to be written to a directory and not 
individual files at the root. I need to double check that that's the case, but 
I think it'd be useful to align our default with those.

##########
File path: 
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/ListAzureDataLakeStorage.java
##########
@@ -129,6 +131,16 @@
             
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
             .build();
 
+    public static final PropertyDescriptor TEMP_FILE_PREFIX = new 
PropertyDescriptor.Builder()
+            .name("azure-temp-file-prefix")
+            .displayName("Temp File Prefix for Azure")

Review comment:
       @exceptionfactory There are already overall filter properties. This is 
adding a filter specifically for the temp files that `PutAzureDataLakeStorage` 
is creating/using so it's reasonably named, but your question got me thinking 
though, why do we need another filter property?
   
   I think it'd be clearer and simpler to include a default filter for hidden 
files with instructions to the user to set the filter if they want to override 
that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to