Peter Turcsanyi created NIFI-9390:
-------------------------------------

             Summary: MergeContent does not work properly in Kafka Connect 
Stateless flows
                 Key: NIFI-9390
                 URL: https://issues.apache.org/jira/browse/NIFI-9390
             Project: Apache NiFi
          Issue Type: Bug
            Reporter: Peter Turcsanyi


MergeContent does not work in Kafka Connect Stateless flows if the processor is 
not the first one in the flow.

NIFI-8469 solved this issue for flows similar to the ones mentioned in that 
jira:
GetFile --> SplitText --> ReplaceText --> MergeContent --> PutS3Object
In this scenario GetFile reads a file and creates a FF from it. SplitText 
creates multiple FFs. ReplaceText processes all the FFs first and MergeContent 
will be triggered only after ReplaceText finished with all FFs. So it is able 
to merge the FFs (but only splits coming from the same input file read by 
GetFile).

A Kafka Connect Stateless Sink flow may look like this:
Input port --> "Process/Transform messages" --> MergeContent --> PutS3Object
The Kafka Connect framework polls some messages from the Kafka topic that will 
be enqueued in the stateless flow. Then the first processor gets triggered with 
one FF. This FF is sent downstream till the end of the flow. MergeContent can 
only see one FF at a time so it cannot merge multiple files.

So the issue is that the first processor gets triggered for each flowfile 
separately (which is fine for GetFile but not for the KC Stateless flow). Only 
subsequent processors gets triggered in "triggerWhileReady" way:
[https://github.com/apache/nifi/blob/18fc492e4ce97d2bca4f96df1a1f1eb2b3e80899/nifi-stateless/nifi-stateless-bundle/nifi-stateless-engine/src/main/java/org/apache/nifi/stateless/flow/StandardStatelessFlowCurrent.java#L68-L77]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to