Peter Turcsanyi created NIFI-9390:
-------------------------------------
Summary: MergeContent does not work properly in Kafka Connect
Stateless flows
Key: NIFI-9390
URL: https://issues.apache.org/jira/browse/NIFI-9390
Project: Apache NiFi
Issue Type: Bug
Reporter: Peter Turcsanyi
MergeContent does not work in Kafka Connect Stateless flows if the processor is
not the first one in the flow.
NIFI-8469 solved this issue for flows similar to the ones mentioned in that
jira:
GetFile --> SplitText --> ReplaceText --> MergeContent --> PutS3Object
In this scenario GetFile reads a file and creates a FF from it. SplitText
creates multiple FFs. ReplaceText processes all the FFs first and MergeContent
will be triggered only after ReplaceText finished with all FFs. So it is able
to merge the FFs (but only splits coming from the same input file read by
GetFile).
A Kafka Connect Stateless Sink flow may look like this:
Input port --> "Process/Transform messages" --> MergeContent --> PutS3Object
The Kafka Connect framework polls some messages from the Kafka topic that will
be enqueued in the stateless flow. Then the first processor gets triggered with
one FF. This FF is sent downstream till the end of the flow. MergeContent can
only see one FF at a time so it cannot merge multiple files.
So the issue is that the first processor gets triggered for each flowfile
separately (which is fine for GetFile but not for the KC Stateless flow). Only
subsequent processors gets triggered in "triggerWhileReady" way:
[https://github.com/apache/nifi/blob/18fc492e4ce97d2bca4f96df1a1f1eb2b3e80899/nifi-stateless/nifi-stateless-bundle/nifi-stateless-engine/src/main/java/org/apache/nifi/stateless/flow/StandardStatelessFlowCurrent.java#L68-L77]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)