[
https://issues.apache.org/jira/browse/NIFI-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-9892:
-----------------------------
Status: Patch Available (was: Open)
> Update Azure storage related processors to align with NiFi best practices
> -------------------------------------------------------------------------
>
> Key: NIFI-9892
> URL: https://issues.apache.org/jira/browse/NIFI-9892
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The Azure Storage related processors have been through many iterations and
> updates. At this point, they are not following best practices for NiFi
> processors and, as a result, make configuring the processors difficult for
> users. Specifically, I have found the following problems in using them:
> * Properties are ordered inconsistently. Some processors have Credentials
> Service as the first property, some as 4th property. The ordering of
> 'directory' vs 'blob name' vs 'file system' are random. Processors should
> take great care when ordering properties in order to expose the most
> important properties first, in order to make configuration as simple as
> possible for the user.
> * Default values are inconsistent. Some processors use a default value for
> "Blob", some don't. The same sort of inconsistency exists for many properties.
> * Default values are largely missing. It does not make sense to have a
> default value for the "Container" property for "ListAzureBlobStorage" but the
> default value should absolutely be set for "FetchAzureBlobStorage", as well
> as move, delete, put, etc.
> * Poor default values. Some property values do have defaults. But they
> default to values like "${azure.blob}" for the filename when the NiFI Core
> Attribute of "filename" makes more sense.
> * Inconsistent property names. Some properties use the property name "Blob"
> while others use "Blob Name" to mean the same thing. There may be other,
> similar examples.
> * List processors do not populate core attributes. Listing processors should
> always populate attributes such as "filename" and "path", rather than just
> more specific attributes such as "azure.blob"
> * Abstract processors exist and implement
> `getSupportedPropertyDescriptors()`. This is an anti-pattern. While it makes
> sense in many cases to implement `Set<Relationship> getRelationships()`, it
> should NOT implement `List<PropertyDescriptor>
> getSupportedPropertyDescriptors()`. This is because the point of the abstract
> class is for others to extend it. Others that extend it will have more
> customized Lists of Property Descriptors. This often leads to a pattern of
> calling `super.getSupportedPropertyDescriptors()` and then adding specific
> properties to the end of the List. This is bad because, as noted above,
> properties should be ordered in a way that makes the most sense for that
> particular processor, adding the most important properties first and keeping
> like properties together.
> * Directory Name property is awkward and confusing. The Directory Name
> property is required for several properties. A value of "/" is invalid and no
> value can have a leading "/". To write to the root directory, the user must
> go in and click the checkbox for "Empty Value". The fact that this needed to
> be explicitly called out in the documentation is a key indicator that it
> violates user expectations. While the Azure client may not allow a leading
> `/`, the property should absolutely allow it. And the property should not be
> required, allowing an unset value to default to the root directory.
> * Code cleanup
> ** Processors use old API for session read/write callbacks. Then create
> `AtomicReference<Exception>` to hold Exceptions so that they can be inspected
> later, etc. This can be cleaned up by using newer API methods that return
> `InputStream` / `OutputStream`.
> ** Code should mark variables `final` when possible.
> * Integration tests no longer work
> ** Some of the Integration tests do not work. Some work sometimes and fail
> intermittently. They need to either be fixed or deleted
> * FetchAzureDatalakeStorage emits both CONTENT_MODIFIED and FETCH provenance
> events. These two should not be emitted by the same processor, as FETCH
> implies that the content of the FlowFile was modified to match that of the
> data that was FETCHed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)