Re: Handling of the contents of the failure result in the python FlowFileTransform API

Mark Payne Mon, 05 Aug 2024 06:25:50 -0700

Marton,

Sure, the idea of “an API should be easy to use correctly and hard to use 
incorrectly” is certainly a great principle in API design.
But I’m not entirely sure that it applies here directly. The idea of “make it 
hard to use incorrectly” doesn’t mean to throw an Exception
when used incorrectly but rather to make the API such that you’d never even 
return an object that is invalid. But the idea of significant
API changes in order to make such a thing difficult is probably far beyond the 
scope of this thread.


That said, we can consider throwing an Exception if returning a different 
payload along with the ‘failure’ relationship. But then
we would need to consider several different concepts, at a higher application 
level:

1. What if you return the input payload/FlowFile Content along with the 
‘failure’ relationship? Should it be a failure? Or only if the content has 
changed? If considering only the API then it would make sense that returning 
the same content would be okay (though inefficient). But that would mean also 
doing some sort of hash on the returned content and the original content to 
ensure that it’s the same, etc. which is probably a very bad idea due to 
performance concerns.

2. What happens if an Exception is thrown? How does the Exception get handled? 
We have two options:

- Catch the Exception in the Processor on the Java side; in this case we should 
handle it like we do other failures: log an error and route the original 
FlowFile to the ‘failure’ relationship. Now, we’ve not really accomplished 
anything because the net result is the same - the original FlowFile goes to 
failure. But we’ve wasted a lot of resources and complicated the codebase to do 
so.

- Let the Exception fly. Don’t bother catching it at all on the Java side, 
unless the framework ultimately catches it. The framework will then log an 
Exception and rollback the session, putting the data back into the queue. Given 
that the data was intended to be routed to ‘failure’ and this is a 
Transformation type of Processor (given that we currently only support 
transformations in python) the transformation will fail again. And again. And 
the data will just stay in the queue forever, continually failing. While this 
behavior of the framework makes sense, it is a “last ditch effort” to keep 
things in line and generally can be avoided if there’s a better option. And 
given that we already know the developer of the Processor wants to route to 
failure, I’d say the better option is to route the original FlowFile to failure 
to begin with.

Thanks
-Mark

> On Aug 1, 2024, at 8:37 AM, Marton Szasz <sza...@apache.org> wrote:
> 
> I think good APIs are easy to use correctly and hard to use incorrectly. (I 
> stole this from C++ books.)
> 
> For this reason, I'm +1 for throwing an exception when the API user specifies 
> a content with a transform result headed to failure. Also +1 for making this 
> clear in the docs.
> 
> If an exception is not possible or feasible, an error log works, too. But we 
> should make it as loud as feasible, so the user notices the programming error 
> early.
> 
> Thanks,
> Marton
> 
> On 7/31/24 19:52, Mark Payne wrote:
>> Agreed, it should absolutely be documented. I don’t have a particularly 
>> strong opinion regarding whether or not to log a warning.
>> 
>> Thanks
>> -Mark
>> 
>> 
>>> On Jul 31, 2024, at 1:00 PM, Gábor Gyimesi <gamezb...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> MiNiFi C++ copied the behavior of the NiFi Python API to be compatible
>>> with the NiFi python processors (although some minor differences may
>>> apply [1]), so it works the same way in this case.
>>> I think this behavior is understandable and can be accepted, but I
>>> think Ferenc has a point that this at least should be documented in
>>> the developer guide. The interface of the FlowFileTransformResult
>>> suggests that the content can be specified for any relationship, but
>>> it is discarded in this case without any notice. I suppose it would be
>>> an overkill and not backwards compatible to change the API for adding
>>> a failure specific transfer, but I think besides documenting this
>>> behavior we could add a warning in the logs (or an exception?) if
>>> someone specifies any content for this relationship.
>>> 
>>> Regards,
>>> Gábor
>>> 
>>> [1] 
>>> https://github.com/apache/nifi-minifi-cpp/blob/main/extensions/python/PYTHON.md#using-nifi-python-processors
>>> 
>>> On Wed, 31 Jul 2024 at 18:24, Mark Payne <marka...@hotmail.com> wrote:
>>>> Hi Ferenc,
>>>> 
>>>> I’m not overly familiar with MiNiFi C++ but this is how it works in 
>>>> traditional NiFi as well.
>>>> This is intended. While Processors are generally free to treat 
>>>> relationships as they wish,
>>>> the convention is always to route the original incoming FlowFile to 
>>>> ‘failure’, never a modified version of it.
>>>> At least with respect to content. Some attributes might be added, for 
>>>> example to explain why there was a failure,
>>>> but the content should not be changed.
>>>> 
>>>> In the Python API we’ve generally moved toward being more prescriptive 
>>>> about some of these things.
>>>> So while it’s possible to do so in Java, it should not be done. In Python, 
>>>> we don’t even allow it.
>>>> 
>>>> Thanks
>>>> -Mark
>>>> 
>>>> 
>>>>> On Jul 31, 2024, at 10:46 AM, Ferenc Gerlits <fgerl...@apache.org> wrote:
>>>>> 
>>>>> Hello NiFi community,
>>>>> 
>>>>> We have implemented NiFi's FlowFileTransform API for Python processors
>>>>> in MiNiFi C++, and we have written a few processors using it. We have
>>>>> noticed that the "failure" relationship works like this: if the
>>>>> transform() method returns
>>>>> FlowFileTransformResult(relationship="failure",
>>>>> contents=some_contents, attributes=some_attributes), then
>>>>> - some_contents are discarded,
>>>>> - some_attributes are merged with the attributes of the original flow 
>>>>> file,
>>>>> - and then the original flow file with the merged attributes is
>>>>> routed to the failure relationship. [1]
>>>>> 
>>>>> The fact that the contents are discarded was unexpected to me, and it
>>>>> is not documented in the developer guide [2]. We could add this to the
>>>>> developer guide, but I think it would make more sense if the failure
>>>>> result worked in the same way as the success result: if contents is
>>>>> None, then we use the original contents, but if contents is not None,
>>>>> then it becomes the new contents of the flow file which is routed to
>>>>> the failure relationship.
>>>>> 
>>>>> What do you think?
>>>>> Ferenc
>>>>> 
>>>>> [1] 
>>>>> https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-py4j-bundle/nifi-py4j-bridge/src/main/java/org/apache/nifi/python/processor/FlowFileTransformProxy.java
>>>>> [2] 
>>>>> https://github.com/apache/nifi/blob/main/nifi-docs/src/main/asciidoc/python-developer-guide.adoc

Re: Handling of the contents of the failure result in the python FlowFileTransform API

Reply via email to