[jira] [Commented] (NIFI-3415) Add "Rollback on Failure" property to PutHiveStreaming, PutHiveQL, and PutSQL

Koji Kawamura (JIRA) Mon, 13 Feb 2017 18:33:53 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864907#comment-15864907
 ]


Koji Kawamura commented on NIFI-3415:
-------------------------------------

[~mattyb149], I started looking at the code for those processors to figure out 
how we can add this improvement.

h3. Q1: Technically not rollback, but commit and move forward
Since a mixed state with processed and failed incoming FlowFiles is possible, 
NiFi process session has to be move forward . So, I'm going to route FlowFiles 
to 'SELF', those are currently routed to either 'failure' or 'retry' 
relationships when something happened.
Technically this doesn't rollback a NiFi process session, but put FlowFiles 
back into self and commit so that those can be processed again.
Is this reasonable? If we do rollback a NiFi process session, the FlowFiles 
already processed properly, meaning its RDB transaction is already committed 
will also be rollbacked. 

h3. Q2: Yielding processor
Also, I assume the main purpose of this feature is not to send additional 
requests to the external database which potentially slows down the database to 
recover from illegal state. So, I think we should 'yield' the processor when 
'Rollback on Failure' is engaged. Does this seem to be correct?

h3. Q3: Should we care about the type of failure?
Another question is, whether we should keep routing FlowFiles to 'failure' 
relationship if the cause was a SQLNonTransientException, even if the processor 
is configured 'Rollback on Failure' to true.
I think that Exception indicating something is wrong with the incoming FlowFile 
data, instead of the external database system. If we keep those FlowFiles, it 
will cause infinite loop.
Or maybe the processor has wrong configuration, such as bad prepared statement 
syntax ... etc. In that case the user may want to keep FlowFile in the incoming 
queue, fix the processor first, then run it again.
I think it would be more flexible if we provide those options, by a property 
named *Failure Handling Strategy* with multiple choices:

- Keep FlowFiles: Regardless of the type of failure, keep FlowFiles in the 
incoming queue. Good for an experimental flow.
- Keep Recoverable FlowFiles: Recoverable FlowFiles will be kept in the 
incoming queue, and others to 'failure' assuming the failure was caused by the 
input FlowFile data variation.
- Transfer to 'failure' and 'retry': DEFAULT, existing behavior. Recoverable 
FlowFiles will be routed to 'retry', and others to 'failure'.

How do you think?

> Add "Rollback on Failure" property to PutHiveStreaming, PutHiveQL, and PutSQL
> -----------------------------------------------------------------------------
>
>                 Key: NIFI-3415
>                 URL: https://issues.apache.org/jira/browse/NIFI-3415
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Matt Burgess
>            Assignee: Koji Kawamura
>
> Many Put processors (such as PutHiveStreaming, PutHiveQL, and PutSQL) offer 
> "failure" and "retry" relationships for flow files that cannot be processed, 
> perhaps due to issues with the external system or other errors.
> However there are use cases where if a Put fails, then no other flow files 
> should be processed until the issue(s) have been resolved.  This should be 
> configurable for said processors, to enable both the current behavior and a 
> "stop on failure" type of behavior.
> I propose a property be added to the Put processors (at a minimum the 
> PutHiveStreaming, PutHiveQL, and PutSQL processors) called "Rollback on 
> Failure", which offers true or false values.  If set to true, then the 
> "failure" and "retry" relationships should be removed from the processor 
> instance, and if set to false, those relationships should be offered.
> If Rollback on Failure is false, then the processor should continue to behave 
> as it has. If set to true, then if any error occurs while processing a flow 
> file, the session should be rolled back rather than transferring the flow 
> file to some error-handling relationship.
> It may also be the case that if Rollback on Failure is true, then the 
> incoming connection must use a FIFO Prioritizer, but I'm not positive. The 
> documentation should be updated to include any such requirements.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NIFI-3415) Add "Rollback on Failure" property to PutHiveStreaming, PutHiveQL, and PutSQL

Reply via email to