jfrazee commented on a change in pull request #5098:
URL: https://github.com/apache/nifi/pull/5098#discussion_r645674083
##########
File path:
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java
##########
@@ -84,12 +89,24 @@
.allowableValues(FAIL_RESOLUTION, REPLACE_RESOLUTION,
IGNORE_RESOLUTION)
.build();
+ public static final PropertyDescriptor TEMP_FILE_PREFIX = new
PropertyDescriptor.Builder()
Review comment:
I think it's important to default this to being enabled if not required
but with a choice of the location to write to.
Not using a temp file opens up a risk for data corruption and there
shouldn't be a noticeable performance penalty so I don't think there's a
compelling case for keeping the previous behavior. Moreover, one of the goals
of ADLS Gen2 was specifically to allow efficient write then rename so this
improvement is for the most part the "right thing" to do.
I don't think we can change it to `required(true)` but we can treat a null
`getValue()` as enabled so there shouldn't be an upgrade issue.
##########
File path:
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/test/java/org/apache/nifi/processors/azure/storage/ITPutAzureDataLakeStorage.java
##########
@@ -150,6 +240,21 @@ public void testPutBigFile() throws Exception {
assertSuccess(DIRECTORY, FILE_NAME, fileData);
}
+ @Ignore
+ // ignore excessive test with larger file size
Review comment:
There was a recent issue that would have been caught with big file tests
and since this is an IT, I think it's fair to not ignore. At this size it runs
quick anyway.
```suggestion
```
##########
File path:
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java
##########
@@ -192,4 +211,29 @@ static void uploadContent(DataLakeFileClient fileClient,
InputStream in, long le
fileClient.flush(length);
}
+
+ private DataLakeFileClient renameFile(String fileName, String
directoryPath, DataLakeFileClient fileClient, boolean overwrite) {
+ try {
+ final DataLakeRequestConditions destinationCondition = new
DataLakeRequestConditions();
+ if (!overwrite) {
+ destinationCondition.setIfNoneMatch("*");
+ }
+ String destinationPath = StringUtils.isNotEmpty(directoryPath)
Review comment:
```suggestion
String destinationPath = StringUtils.isNotBlank(directoryPath)
```
##########
File path:
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java
##########
@@ -84,12 +89,24 @@
.allowableValues(FAIL_RESOLUTION, REPLACE_RESOLUTION,
IGNORE_RESOLUTION)
.build();
+ public static final PropertyDescriptor TEMP_FILE_PREFIX = new
PropertyDescriptor.Builder()
Review comment:
I haven't tested yet, but it looks like the validator will prevent using
directories for this?
There are often downstream users -- Hadoop, Spark, etc. -- that I believe
will assume temp files are going to be written to a directory and not
individual files at the root. I need to double check that that's the case, but
I think it'd be useful to align our default with those.
##########
File path:
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/ListAzureDataLakeStorage.java
##########
@@ -129,6 +131,16 @@
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
.build();
+ public static final PropertyDescriptor TEMP_FILE_PREFIX = new
PropertyDescriptor.Builder()
+ .name("azure-temp-file-prefix")
+ .displayName("Temp File Prefix for Azure")
Review comment:
@exceptionfactory There are already overall filter properties. This is
adding a filter specifically for the temp files that `PutAzureDataLakeStorage`
is creating/using so it's reasonably named, but your question got me thinking
though, why do we need another filter property?
I think it'd be clearer and simpler to include a default filter for hidden
files with instructions to the user to set the filter if they want to override
that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]