I was thinking the same that first I also wanted to throw an exception, but that would just trigger a rollback and we would retry the flow file later. Still it is reasonable to at least write an error message in the logs about the bad usage.
I also think that even though it is documented, it is not restricted to transfer a new flow file to the original relationship. That should also be logged and removed from the session. I opened a pull request for these changes in MiNiFi C++, maybe it could also be implemented in NiFi [1]. [1] https://github.com/apache/nifi-minifi-cpp/pull/1852 On Mon, 5 Aug 2024 at 15:25, Mark Payne <marka...@hotmail.com> wrote: > > Marton, > > Sure, the idea of “an API should be easy to use correctly and hard to use > incorrectly” is certainly a great principle in API design. > But I’m not entirely sure that it applies here directly. The idea of “make it > hard to use incorrectly” doesn’t mean to throw an Exception > when used incorrectly but rather to make the API such that you’d never even > return an object that is invalid. But the idea of significant > API changes in order to make such a thing difficult is probably far beyond > the scope of this thread. > > That said, we can consider throwing an Exception if returning a different > payload along with the ‘failure’ relationship. But then > we would need to consider several different concepts, at a higher application > level: > > 1. What if you return the input payload/FlowFile Content along with the > ‘failure’ relationship? Should it be a failure? Or only if the content has > changed? If considering only the API then it would make sense that returning > the same content would be okay (though inefficient). But that would mean also > doing some sort of hash on the returned content and the original content to > ensure that it’s the same, etc. which is probably a very bad idea due to > performance concerns. > > 2. What happens if an Exception is thrown? How does the Exception get > handled? We have two options: > > - Catch the Exception in the Processor on the Java side; in this case we > should handle it like we do other failures: log an error and route the > original FlowFile to the ‘failure’ relationship. Now, we’ve not really > accomplished anything because the net result is the same - the original > FlowFile goes to failure. But we’ve wasted a lot of resources and complicated > the codebase to do so. > > - Let the Exception fly. Don’t bother catching it at all on the Java side, > unless the framework ultimately catches it. The framework will then log an > Exception and rollback the session, putting the data back into the queue. > Given that the data was intended to be routed to ‘failure’ and this is a > Transformation type of Processor (given that we currently only support > transformations in python) the transformation will fail again. And again. And > the data will just stay in the queue forever, continually failing. While this > behavior of the framework makes sense, it is a “last ditch effort” to keep > things in line and generally can be avoided if there’s a better option. And > given that we already know the developer of the Processor wants to route to > failure, I’d say the better option is to route the original FlowFile to > failure to begin with. > > Thanks > -Mark > > > On Aug 1, 2024, at 8:37 AM, Marton Szasz <sza...@apache.org> wrote: > > > > I think good APIs are easy to use correctly and hard to use incorrectly. (I > > stole this from C++ books.) > > > > For this reason, I'm +1 for throwing an exception when the API user > > specifies a content with a transform result headed to failure. Also +1 for > > making this clear in the docs. > > > > If an exception is not possible or feasible, an error log works, too. But > > we should make it as loud as feasible, so the user notices the programming > > error early. > > > > Thanks, > > Marton > > > > On 7/31/24 19:52, Mark Payne wrote: > >> Agreed, it should absolutely be documented. I don’t have a particularly > >> strong opinion regarding whether or not to log a warning. > >> > >> Thanks > >> -Mark > >> > >> > >>> On Jul 31, 2024, at 1:00 PM, Gábor Gyimesi <gamezb...@gmail.com> wrote: > >>> > >>> Hi, > >>> > >>> MiNiFi C++ copied the behavior of the NiFi Python API to be compatible > >>> with the NiFi python processors (although some minor differences may > >>> apply [1]), so it works the same way in this case. > >>> I think this behavior is understandable and can be accepted, but I > >>> think Ferenc has a point that this at least should be documented in > >>> the developer guide. The interface of the FlowFileTransformResult > >>> suggests that the content can be specified for any relationship, but > >>> it is discarded in this case without any notice. I suppose it would be > >>> an overkill and not backwards compatible to change the API for adding > >>> a failure specific transfer, but I think besides documenting this > >>> behavior we could add a warning in the logs (or an exception?) if > >>> someone specifies any content for this relationship. > >>> > >>> Regards, > >>> Gábor > >>> > >>> [1] > >>> https://github.com/apache/nifi-minifi-cpp/blob/main/extensions/python/PYTHON.md#using-nifi-python-processors > >>> > >>> On Wed, 31 Jul 2024 at 18:24, Mark Payne <marka...@hotmail.com> wrote: > >>>> Hi Ferenc, > >>>> > >>>> I’m not overly familiar with MiNiFi C++ but this is how it works in > >>>> traditional NiFi as well. > >>>> This is intended. While Processors are generally free to treat > >>>> relationships as they wish, > >>>> the convention is always to route the original incoming FlowFile to > >>>> ‘failure’, never a modified version of it. > >>>> At least with respect to content. Some attributes might be added, for > >>>> example to explain why there was a failure, > >>>> but the content should not be changed. > >>>> > >>>> In the Python API we’ve generally moved toward being more prescriptive > >>>> about some of these things. > >>>> So while it’s possible to do so in Java, it should not be done. In > >>>> Python, we don’t even allow it. > >>>> > >>>> Thanks > >>>> -Mark > >>>> > >>>> > >>>>> On Jul 31, 2024, at 10:46 AM, Ferenc Gerlits <fgerl...@apache.org> > >>>>> wrote: > >>>>> > >>>>> Hello NiFi community, > >>>>> > >>>>> We have implemented NiFi's FlowFileTransform API for Python processors > >>>>> in MiNiFi C++, and we have written a few processors using it. We have > >>>>> noticed that the "failure" relationship works like this: if the > >>>>> transform() method returns > >>>>> FlowFileTransformResult(relationship="failure", > >>>>> contents=some_contents, attributes=some_attributes), then > >>>>> - some_contents are discarded, > >>>>> - some_attributes are merged with the attributes of the original flow > >>>>> file, > >>>>> - and then the original flow file with the merged attributes is > >>>>> routed to the failure relationship. [1] > >>>>> > >>>>> The fact that the contents are discarded was unexpected to me, and it > >>>>> is not documented in the developer guide [2]. We could add this to the > >>>>> developer guide, but I think it would make more sense if the failure > >>>>> result worked in the same way as the success result: if contents is > >>>>> None, then we use the original contents, but if contents is not None, > >>>>> then it becomes the new contents of the flow file which is routed to > >>>>> the failure relationship. > >>>>> > >>>>> What do you think? > >>>>> Ferenc > >>>>> > >>>>> [1] > >>>>> https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-py4j-bundle/nifi-py4j-bridge/src/main/java/org/apache/nifi/python/processor/FlowFileTransformProxy.java > >>>>> [2] > >>>>> https://github.com/apache/nifi/blob/main/nifi-docs/src/main/asciidoc/python-developer-guide.adoc >