Marton, Sure, the idea of “an API should be easy to use correctly and hard to use incorrectly” is certainly a great principle in API design. But I’m not entirely sure that it applies here directly. The idea of “make it hard to use incorrectly” doesn’t mean to throw an Exception when used incorrectly but rather to make the API such that you’d never even return an object that is invalid. But the idea of significant API changes in order to make such a thing difficult is probably far beyond the scope of this thread.
That said, we can consider throwing an Exception if returning a different payload along with the ‘failure’ relationship. But then we would need to consider several different concepts, at a higher application level: 1. What if you return the input payload/FlowFile Content along with the ‘failure’ relationship? Should it be a failure? Or only if the content has changed? If considering only the API then it would make sense that returning the same content would be okay (though inefficient). But that would mean also doing some sort of hash on the returned content and the original content to ensure that it’s the same, etc. which is probably a very bad idea due to performance concerns. 2. What happens if an Exception is thrown? How does the Exception get handled? We have two options: - Catch the Exception in the Processor on the Java side; in this case we should handle it like we do other failures: log an error and route the original FlowFile to the ‘failure’ relationship. Now, we’ve not really accomplished anything because the net result is the same - the original FlowFile goes to failure. But we’ve wasted a lot of resources and complicated the codebase to do so. - Let the Exception fly. Don’t bother catching it at all on the Java side, unless the framework ultimately catches it. The framework will then log an Exception and rollback the session, putting the data back into the queue. Given that the data was intended to be routed to ‘failure’ and this is a Transformation type of Processor (given that we currently only support transformations in python) the transformation will fail again. And again. And the data will just stay in the queue forever, continually failing. While this behavior of the framework makes sense, it is a “last ditch effort” to keep things in line and generally can be avoided if there’s a better option. And given that we already know the developer of the Processor wants to route to failure, I’d say the better option is to route the original FlowFile to failure to begin with. Thanks -Mark > On Aug 1, 2024, at 8:37 AM, Marton Szasz <sza...@apache.org> wrote: > > I think good APIs are easy to use correctly and hard to use incorrectly. (I > stole this from C++ books.) > > For this reason, I'm +1 for throwing an exception when the API user specifies > a content with a transform result headed to failure. Also +1 for making this > clear in the docs. > > If an exception is not possible or feasible, an error log works, too. But we > should make it as loud as feasible, so the user notices the programming error > early. > > Thanks, > Marton > > On 7/31/24 19:52, Mark Payne wrote: >> Agreed, it should absolutely be documented. I don’t have a particularly >> strong opinion regarding whether or not to log a warning. >> >> Thanks >> -Mark >> >> >>> On Jul 31, 2024, at 1:00 PM, Gábor Gyimesi <gamezb...@gmail.com> wrote: >>> >>> Hi, >>> >>> MiNiFi C++ copied the behavior of the NiFi Python API to be compatible >>> with the NiFi python processors (although some minor differences may >>> apply [1]), so it works the same way in this case. >>> I think this behavior is understandable and can be accepted, but I >>> think Ferenc has a point that this at least should be documented in >>> the developer guide. The interface of the FlowFileTransformResult >>> suggests that the content can be specified for any relationship, but >>> it is discarded in this case without any notice. I suppose it would be >>> an overkill and not backwards compatible to change the API for adding >>> a failure specific transfer, but I think besides documenting this >>> behavior we could add a warning in the logs (or an exception?) if >>> someone specifies any content for this relationship. >>> >>> Regards, >>> Gábor >>> >>> [1] >>> https://github.com/apache/nifi-minifi-cpp/blob/main/extensions/python/PYTHON.md#using-nifi-python-processors >>> >>> On Wed, 31 Jul 2024 at 18:24, Mark Payne <marka...@hotmail.com> wrote: >>>> Hi Ferenc, >>>> >>>> I’m not overly familiar with MiNiFi C++ but this is how it works in >>>> traditional NiFi as well. >>>> This is intended. While Processors are generally free to treat >>>> relationships as they wish, >>>> the convention is always to route the original incoming FlowFile to >>>> ‘failure’, never a modified version of it. >>>> At least with respect to content. Some attributes might be added, for >>>> example to explain why there was a failure, >>>> but the content should not be changed. >>>> >>>> In the Python API we’ve generally moved toward being more prescriptive >>>> about some of these things. >>>> So while it’s possible to do so in Java, it should not be done. In Python, >>>> we don’t even allow it. >>>> >>>> Thanks >>>> -Mark >>>> >>>> >>>>> On Jul 31, 2024, at 10:46 AM, Ferenc Gerlits <fgerl...@apache.org> wrote: >>>>> >>>>> Hello NiFi community, >>>>> >>>>> We have implemented NiFi's FlowFileTransform API for Python processors >>>>> in MiNiFi C++, and we have written a few processors using it. We have >>>>> noticed that the "failure" relationship works like this: if the >>>>> transform() method returns >>>>> FlowFileTransformResult(relationship="failure", >>>>> contents=some_contents, attributes=some_attributes), then >>>>> - some_contents are discarded, >>>>> - some_attributes are merged with the attributes of the original flow >>>>> file, >>>>> - and then the original flow file with the merged attributes is >>>>> routed to the failure relationship. [1] >>>>> >>>>> The fact that the contents are discarded was unexpected to me, and it >>>>> is not documented in the developer guide [2]. We could add this to the >>>>> developer guide, but I think it would make more sense if the failure >>>>> result worked in the same way as the success result: if contents is >>>>> None, then we use the original contents, but if contents is not None, >>>>> then it becomes the new contents of the flow file which is routed to >>>>> the failure relationship. >>>>> >>>>> What do you think? >>>>> Ferenc >>>>> >>>>> [1] >>>>> https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-py4j-bundle/nifi-py4j-bridge/src/main/java/org/apache/nifi/python/processor/FlowFileTransformProxy.java >>>>> [2] >>>>> https://github.com/apache/nifi/blob/main/nifi-docs/src/main/asciidoc/python-developer-guide.adoc