Mark Payne created NIFI-9892:
--------------------------------
Summary: Update Azure storage related processors to align with
NiFi best practices
Key: NIFI-9892
URL: https://issues.apache.org/jira/browse/NIFI-9892
Project: Apache NiFi
Issue Type: Improvement
Components: Extensions
Reporter: Mark Payne
Assignee: Mark Payne
The Azure Storage related processors have been through many iterations and
updates. At this point, they are not following best practices for NiFi
processors and, as a result, make configuring the processors difficult for
users. Specifically, I have found the following problems in using them:
* Properties are ordered inconsistently. Some processors have Credentials
Service as the first property, some as 4th property. The ordering of
'directory' vs 'blob name' vs 'file system' are random. Processors should take
great care when ordering properties in order to expose the most important
properties first, in order to make configuration as simple as possible for the
user.
* Default values are inconsistent. Some processors use a default value for
"Blob", some don't. The same sort of inconsistency exists for many properties.
* Default values are largely missing. It does not make sense to have a default
value for the "Container" property for "ListAzureBlobStorage" but the default
value should absolutely be set for "FetchAzureBlobStorage", as well as move,
delete, put, etc.
* Poor default values. Some property values do have defaults. But they default
to values like "${azure.blob}" for the filename when the NiFI Core Attribute of
"filename" makes more sense.
* Inconsistent property names. Some properties use the property name "Blob"
while others use "Blob Name" to mean the same thing. There may be other,
similar examples.
* List processors do not populate core attributes. Listing processors should
always populate attributes such as "filename" and "path", rather than just more
specific attributes such as "azure.blob"
* Abstract processors exist and implement `getSupportedPropertyDescriptors()`.
This is an anti-pattern. While it makes sense in many cases to implement
`Set<Relationship> getRelationships()`, it should NOT implement
`List<PropertyDescriptor> getSupportedPropertyDescriptors()`. This is because
the point of the abstract class is for others to extend it. Others that extend
it will have more customized Lists of Property Descriptors. This often leads to
a pattern of calling `super.getSupportedPropertyDescriptors()` and then adding
specific properties to the end of the List. This is bad because, as noted
above, properties should be ordered in a way that makes the most sense for that
particular processor, adding the most important properties first and keeping
like properties together.
* Directory Name property is awkward and confusing. The Directory Name
property is required for several properties. A value of "/" is invalid and no
value can have a leading "/". To write to the root directory, the user must go
in and click the checkbox for "Empty Value". The fact that this needed to be
explicitly called out in the documentation is a key indicator that it violates
user expectations. While the Azure client may not allow a leading `/`, the
property should absolutely allow it. And the property should not be required,
allowing an unset value to default to the root directory.
* Code cleanup
** Processors use old API for session read/write callbacks. Then create
`AtomicReference<Exception>` to hold Exceptions so that they can be inspected
later, etc. This can be cleaned up by using newer API methods that return
`InputStream` / `OutputStream`.
** Code should mark variables `final` when possible.
* Integration tests no longer work
** Some of the Integration tests do not work. Some work sometimes and fail
intermittently. They need to either be fixed or deleted
* FetchAzureDatalakeStorage emits both CONTENT_MODIFIED and FETCH provenance
events. These two should not be emitted by the same processor, as FETCH implies
that the content of the FlowFile was modified to match that of the data that
was FETCHed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)