[ 
https://issues.apache.org/jira/browse/NIFI-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-9892:
-----------------------------
    Status: Patch Available  (was: Open)

> Update Azure storage related processors to align with NiFi best practices
> -------------------------------------------------------------------------
>
>                 Key: NIFI-9892
>                 URL: https://issues.apache.org/jira/browse/NIFI-9892
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Azure Storage related processors have been through many iterations and 
> updates. At this point, they are not following best practices for NiFi 
> processors and, as a result, make configuring the processors difficult for 
> users. Specifically, I have found the following problems in using them:
>  * Properties are ordered inconsistently. Some processors have Credentials 
> Service as the first property, some as 4th property. The ordering of 
> 'directory' vs 'blob name' vs 'file system' are random. Processors should 
> take great care when ordering properties in order to expose the most 
> important properties first, in order to make configuration as simple as 
> possible for the user.
>  * Default values are inconsistent. Some processors use a default value for 
> "Blob", some don't. The same sort of inconsistency exists for many properties.
>  * Default values are largely missing. It does not make sense to have a 
> default value for the "Container" property for "ListAzureBlobStorage" but the 
> default value should absolutely be set for "FetchAzureBlobStorage", as well 
> as move, delete, put, etc.
>  * Poor default values. Some property values do have defaults. But they 
> default to values like "${azure.blob}" for the filename when the NiFI Core 
> Attribute of "filename" makes more sense.
>  * Inconsistent property names. Some properties use the property name "Blob" 
> while others use "Blob Name" to mean the same thing. There may be other, 
> similar examples.
>  * List processors do not populate core attributes. Listing processors should 
> always populate attributes such as "filename" and "path", rather than just 
> more specific attributes such as "azure.blob"
>  * Abstract processors exist and implement 
> `getSupportedPropertyDescriptors()`.  This is an anti-pattern. While it makes 
> sense in many cases to implement `Set<Relationship> getRelationships()`, it 
> should NOT implement `List<PropertyDescriptor> 
> getSupportedPropertyDescriptors()`. This is because the point of the abstract 
> class is for others to extend it. Others that extend it will have more 
> customized Lists of Property Descriptors. This often leads to a pattern of 
> calling `super.getSupportedPropertyDescriptors()` and then adding specific 
> properties to the end of the List. This is bad because, as noted above, 
> properties should be ordered in a way that makes the most sense for that 
> particular processor, adding the most important properties first and keeping 
> like properties together.
>  * Directory Name property is awkward and confusing. The Directory Name 
> property is required for several properties. A value of "/" is invalid and no 
> value can have a leading "/". To write to the root directory, the user must 
> go in and click the checkbox for "Empty Value". The fact that this needed to 
> be explicitly called out in the documentation is a key indicator that it 
> violates user expectations. While the Azure client may not allow a leading 
> `/`, the property should absolutely allow it. And the property should not be 
> required, allowing an unset value to default to the root directory.
>  * Code cleanup
>  ** Processors use old API for session read/write callbacks. Then create 
> `AtomicReference<Exception>` to hold Exceptions so that they can be inspected 
> later, etc. This can be cleaned up by using newer API methods that return 
> `InputStream` / `OutputStream`.
>  ** Code should mark variables `final` when possible.
>  * Integration tests no longer work
>  ** Some of the Integration tests do not work. Some work sometimes and fail 
> intermittently. They need to either be fixed or deleted
>  * FetchAzureDatalakeStorage emits both CONTENT_MODIFIED and FETCH provenance 
> events. These two should not be emitted by the same processor, as FETCH 
> implies that the content of the FlowFile was modified to match that of the 
> data that was FETCHed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to