Re: Handling of the contents of the failure result in the python FlowFileTransform API

Gábor Gyimesi Mon, 05 Aug 2024 08:08:29 -0700

I was thinking the same that first I also wanted to throw an
exception, but that would just trigger a rollback and we would retry
the flow file later. Still it is reasonable to at least write an error
message in the logs about the bad usage.


I also think that even though it is documented, it is not restricted
to transfer a new flow file to the original relationship. That should
also be logged and removed from the session. I opened a pull request
for these changes in MiNiFi C++, maybe it could also be implemented in
NiFi [1].

[1] https://github.com/apache/nifi-minifi-cpp/pull/1852

On Mon, 5 Aug 2024 at 15:25, Mark Payne <marka...@hotmail.com> wrote:
>
> Marton,
>
> Sure, the idea of “an API should be easy to use correctly and hard to use 
> incorrectly” is certainly a great principle in API design.
> But I’m not entirely sure that it applies here directly. The idea of “make it 
> hard to use incorrectly” doesn’t mean to throw an Exception
> when used incorrectly but rather to make the API such that you’d never even 
> return an object that is invalid. But the idea of significant
> API changes in order to make such a thing difficult is probably far beyond 
> the scope of this thread.
>
> That said, we can consider throwing an Exception if returning a different 
> payload along with the ‘failure’ relationship. But then
> we would need to consider several different concepts, at a higher application 
> level:
>
> 1. What if you return the input payload/FlowFile Content along with the 
> ‘failure’ relationship? Should it be a failure? Or only if the content has 
> changed? If considering only the API then it would make sense that returning 
> the same content would be okay (though inefficient). But that would mean also 
> doing some sort of hash on the returned content and the original content to 
> ensure that it’s the same, etc. which is probably a very bad idea due to 
> performance concerns.
>
> 2. What happens if an Exception is thrown? How does the Exception get 
> handled? We have two options:
>
> - Catch the Exception in the Processor on the Java side; in this case we 
> should handle it like we do other failures: log an error and route the 
> original FlowFile to the ‘failure’ relationship. Now, we’ve not really 
> accomplished anything because the net result is the same - the original 
> FlowFile goes to failure. But we’ve wasted a lot of resources and complicated 
> the codebase to do so.
>
> - Let the Exception fly. Don’t bother catching it at all on the Java side, 
> unless the framework ultimately catches it. The framework will then log an 
> Exception and rollback the session, putting the data back into the queue. 
> Given that the data was intended to be routed to ‘failure’ and this is a 
> Transformation type of Processor (given that we currently only support 
> transformations in python) the transformation will fail again. And again. And 
> the data will just stay in the queue forever, continually failing. While this 
> behavior of the framework makes sense, it is a “last ditch effort” to keep 
> things in line and generally can be avoided if there’s a better option. And 
> given that we already know the developer of the Processor wants to route to 
> failure, I’d say the better option is to route the original FlowFile to 
> failure to begin with.
>
> Thanks
> -Mark
>
> > On Aug 1, 2024, at 8:37 AM, Marton Szasz <sza...@apache.org> wrote:
> >
> > I think good APIs are easy to use correctly and hard to use incorrectly. (I 
> > stole this from C++ books.)
> >
> > For this reason, I'm +1 for throwing an exception when the API user 
> > specifies a content with a transform result headed to failure. Also +1 for 
> > making this clear in the docs.
> >
> > If an exception is not possible or feasible, an error log works, too. But 
> > we should make it as loud as feasible, so the user notices the programming 
> > error early.
> >
> > Thanks,
> > Marton
> >
> > On 7/31/24 19:52, Mark Payne wrote:
> >> Agreed, it should absolutely be documented. I don’t have a particularly 
> >> strong opinion regarding whether or not to log a warning.
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>> On Jul 31, 2024, at 1:00 PM, Gábor Gyimesi <gamezb...@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> MiNiFi C++ copied the behavior of the NiFi Python API to be compatible
> >>> with the NiFi python processors (although some minor differences may
> >>> apply [1]), so it works the same way in this case.
> >>> I think this behavior is understandable and can be accepted, but I
> >>> think Ferenc has a point that this at least should be documented in
> >>> the developer guide. The interface of the FlowFileTransformResult
> >>> suggests that the content can be specified for any relationship, but
> >>> it is discarded in this case without any notice. I suppose it would be
> >>> an overkill and not backwards compatible to change the API for adding
> >>> a failure specific transfer, but I think besides documenting this
> >>> behavior we could add a warning in the logs (or an exception?) if
> >>> someone specifies any content for this relationship.
> >>>
> >>> Regards,
> >>> Gábor
> >>>
> >>> [1] 
> >>> https://github.com/apache/nifi-minifi-cpp/blob/main/extensions/python/PYTHON.md#using-nifi-python-processors
> >>>
> >>> On Wed, 31 Jul 2024 at 18:24, Mark Payne <marka...@hotmail.com> wrote:
> >>>> Hi Ferenc,
> >>>>
> >>>> I’m not overly familiar with MiNiFi C++ but this is how it works in 
> >>>> traditional NiFi as well.
> >>>> This is intended. While Processors are generally free to treat 
> >>>> relationships as they wish,
> >>>> the convention is always to route the original incoming FlowFile to 
> >>>> ‘failure’, never a modified version of it.
> >>>> At least with respect to content. Some attributes might be added, for 
> >>>> example to explain why there was a failure,
> >>>> but the content should not be changed.
> >>>>
> >>>> In the Python API we’ve generally moved toward being more prescriptive 
> >>>> about some of these things.
> >>>> So while it’s possible to do so in Java, it should not be done. In 
> >>>> Python, we don’t even allow it.
> >>>>
> >>>> Thanks
> >>>> -Mark
> >>>>
> >>>>
> >>>>> On Jul 31, 2024, at 10:46 AM, Ferenc Gerlits <fgerl...@apache.org> 
> >>>>> wrote:
> >>>>>
> >>>>> Hello NiFi community,
> >>>>>
> >>>>> We have implemented NiFi's FlowFileTransform API for Python processors
> >>>>> in MiNiFi C++, and we have written a few processors using it. We have
> >>>>> noticed that the "failure" relationship works like this: if the
> >>>>> transform() method returns
> >>>>> FlowFileTransformResult(relationship="failure",
> >>>>> contents=some_contents, attributes=some_attributes), then
> >>>>> - some_contents are discarded,
> >>>>> - some_attributes are merged with the attributes of the original flow 
> >>>>> file,
> >>>>> - and then the original flow file with the merged attributes is
> >>>>> routed to the failure relationship. [1]
> >>>>>
> >>>>> The fact that the contents are discarded was unexpected to me, and it
> >>>>> is not documented in the developer guide [2]. We could add this to the
> >>>>> developer guide, but I think it would make more sense if the failure
> >>>>> result worked in the same way as the success result: if contents is
> >>>>> None, then we use the original contents, but if contents is not None,
> >>>>> then it becomes the new contents of the flow file which is routed to
> >>>>> the failure relationship.
> >>>>>
> >>>>> What do you think?
> >>>>> Ferenc
> >>>>>
> >>>>> [1] 
> >>>>> https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-py4j-bundle/nifi-py4j-bridge/src/main/java/org/apache/nifi/python/processor/FlowFileTransformProxy.java
> >>>>> [2] 
> >>>>> https://github.com/apache/nifi/blob/main/nifi-docs/src/main/asciidoc/python-developer-guide.adoc
>

Re: Handling of the contents of the failure result in the python FlowFileTransform API

Reply via email to