Hi Team, Recently there was a request [1] to support splitting a flow file into multiple flow files using the python FlowFileTransform API, which would result in multiple outgoing flow files. A valid use case was presented for this: "Input is a single flowfile which contains an excel file, and output would be multiple flowfiles, where each flowfile will contain one sheet from the excel file.".
As Joe Witt commented on the ticket the current APIs only support the one flowfile in/one flowfile out model, whereas this is a request to add python API support of the model of single flow file in and several flow files out. I think this is a good idea and I think it could be generalized for other types of python processors as well. There was a merged PR [2] to support source python processors, and I think we should also support multiple flow file outputs for source processors too. There could be use cases like the ListenTCP processor or any polling processor that could periodically be checking a queue and creating flow files from all the new entries since the last trigger. Even though a source processor could be written in a way to return multiple records in a single flow file and then splitting it with the SplitRecord processor, but it's more of a workaround than a solution. With the previously mentioned polling type of processor there could be triggers when no new entries are available at all, so no flow file can be generated. Because of this I also suggested a change to the API to allow returning no new flow files in a trigger [3]. We may also consider adding the option to yield for some time in this case. So there are a couple of questions to the community: 1. Do you agree to add support for multiple flow file outputs on the python API for both transform and source flow files? 2. Do you agree to add the support for returning with no flow files from source processors? 3. Do you think we should add an option to yield in case no output files are returned or that complicates the API way too much for a user? I also think these changes should be implemented before the NiFi 2.0 release. As I talked with Peter Gyori he said he had already started working on the "no output" feature and said he would be happy to work on the multiple flow file output change as well. I would also be happy to help him and port these changes on the MiNiFi C++ side. Feel free to comment with any request or requirement on the related API change. Regards, Gabor [1] https://issues.apache.org/jira/browse/NIFI-13402 [2] https://github.com/apache/nifi/pull/9000 [3] https://issues.apache.org/jira/browse/NIFI-13604