Mark Payne created NIFI-5685:
--------------------------------
Summary: Allow processors' relationships to be grouped together
Key: NIFI-5685
URL: https://issues.apache.org/jira/browse/NIFI-5685
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework, Core UI, Documentation & Website
Reporter: Mark Payne
One of key tenants of NiFi is that a Processor knows whether or not it failed
to do its specific job - but does not know whether a 'failure' occurred within
the context of the flow itself. This is the reason that we often see a
'success' and a 'failure' relationship and let the user choose how to handle a
'failure', instead of using some more abstract mechanism such as a Dead-Letter
Queue.
Quite often, though, processors have many reasons that they could fail to
perform their task. For example, if a PutFile processor fails, the FlowFile is
routed to 'failure' and it may not be clear to the user without looking at
logs/bulletins, etc. why it failed. Did the destination directory not exist?
Was there already a file with that name? Out of disk space/general IO problem?
There are times when a user wants to handle the failure differently.
At present, we tend to do one of two things:
1) Add a new relationship. We may now have a separate relationship for
'duplicate filename', one for 'directory.missing', one for 'io.failure', etc.
Unfortunately, in this case adding new relationships can result in making
existing flows invalid because not all relationships are connected.
Additionally, when the user goes to create a connection / auto-terminate, they
now have a lot of different relationships that they have to deal with, and this
is a pain if they want to treat all failures the same way. This also often
leads to relationships like 'Retry' that are poorly named because as described
above, it is not really known by the developer at compilation time if the
FlowFile should be retried - it depends on the context of the flow itself and
the user's intent/desire.
2) The second approach that is sometimes taken is to add an attribute like
"failure.reason". This is problematic for a couple of reasons. First, users
then must route 'failure' to a RouteOnAttribute processor to route the FlowFile
based on all of the possible conditions. Secondly, this requires that all
conditions be clearly documented and not change. Thirdly, this is error-prone
because it's easy to make a typo or forget a particular value in your
RouteOnAttribute.
So, I propose allowing Relationships to be grouped together. From the Processor
developer's point of view, it might look like the following:
{{final Relationship SUCCESS = new Relationship.Builder()}}
{{ .name("success");}}
{{ .explanation("Data successfully written to disk")}}
{{ .build(); // no grouping}}
{{final Relationship DUPLICATE_FILENAME = new Relationship.Builder()}}
{{ .name("duplicate.filename")}}
{{ .explanation("A file already exists with the same filename")}}
{{ .group("failure")}}
{{ .build();}}
{{final Relationship IO_FAILURE = new Relationship.builder()}}
{{ .name("io.failure")}}
{{ .explanation("Unable to store the data to disk do to an I/O failure, such
as too many open files, out of storage space, etc.")}}
{{ .group("failure")}}
{{ .build();}}
In the UI, then, when a user is creating a Connection or updating one, or
auto-terminating Relationships, they should be able to choose "success" or
"failure" - or expand the "failure" somehow and choose individual
relationships. The general "failure" relationship does not actually exist but
instead is a grouping of all "failure" relationships.
This provides the user an easy way to easily select the appropriate
relationships still. It also gives the user much more control over how to route
data when a failure occurs. Additionally, routing to 'duplicate.filename', for
instance, means that the Provenance data will also have a lot more context so
that users can later understand why the failure occurs. And it does this
without the error-prone steps required by the second suggestion above.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)