[ 
https://issues.apache.org/jira/browse/NIFI-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Turcsanyi updated NIFI-9390:
----------------------------------
    Description: 
MergeContent does not work in Kafka Connect Stateless flows if the processor is 
not the first one in the flow.

NIFI-8469 solved this issue for flows similar to the ones mentioned in that 
jira:
GetFile --> SplitText --> ReplaceText --> MergeContent --> PutS3Object
In this scenario GetFile reads a file and creates a FF from it. SplitText 
creates multiple FFs. ReplaceText processes all the FFs first and MergeContent 
will be triggered only after ReplaceText finished with all FFs. So it is able 
to merge the FFs (but only splits coming from the same input file read by 
GetFile).

A Kafka Connect Stateless Sink flow may look like this:
Input port --> "Process/Transform messages" --> MergeContent --> PutS3Object
The Kafka Connect framework polls some messages from the Kafka topic that will 
be enqueued in the stateless flow. Then the first processor gets triggered with 
one FF. This FF is sent downstream till the end of the flow. MergeContent can 
only see one FF at a time so it cannot merge multiple files.

So the issue is that the first processor gets triggered for each flowfile 
separately (which is fine for GetFile but not for the KC Stateless flow). Only 
subsequent processors get triggered in "triggerWhileReady" way:
[https://github.com/apache/nifi/blob/18fc492e4ce97d2bca4f96df1a1f1eb2b3e80899/nifi-stateless/nifi-stateless-bundle/nifi-stateless-engine/src/main/java/org/apache/nifi/stateless/flow/StandardStatelessFlowCurrent.java#L68-L77]

  was:
MergeContent does not work in Kafka Connect Stateless flows if the processor is 
not the first one in the flow.

NIFI-8469 solved this issue for flows similar to the ones mentioned in that 
jira:
GetFile --> SplitText --> ReplaceText --> MergeContent --> PutS3Object
In this scenario GetFile reads a file and creates a FF from it. SplitText 
creates multiple FFs. ReplaceText processes all the FFs first and MergeContent 
will be triggered only after ReplaceText finished with all FFs. So it is able 
to merge the FFs (but only splits coming from the same input file read by 
GetFile).

A Kafka Connect Stateless Sink flow may look like this:
Input port --> "Process/Transform messages" --> MergeContent --> PutS3Object
The Kafka Connect framework polls some messages from the Kafka topic that will 
be enqueued in the stateless flow. Then the first processor gets triggered with 
one FF. This FF is sent downstream till the end of the flow. MergeContent can 
only see one FF at a time so it cannot merge multiple files.

So the issue is that the first processor gets triggered for each flowfile 
separately (which is fine for GetFile but not for the KC Stateless flow). Only 
subsequent processors gets triggered in "triggerWhileReady" way:
[https://github.com/apache/nifi/blob/18fc492e4ce97d2bca4f96df1a1f1eb2b3e80899/nifi-stateless/nifi-stateless-bundle/nifi-stateless-engine/src/main/java/org/apache/nifi/stateless/flow/StandardStatelessFlowCurrent.java#L68-L77]


> MergeContent does not work properly in Kafka Connect Stateless flows
> --------------------------------------------------------------------
>
>                 Key: NIFI-9390
>                 URL: https://issues.apache.org/jira/browse/NIFI-9390
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: NiFi Stateless
>            Reporter: Peter Turcsanyi
>            Priority: Major
>
> MergeContent does not work in Kafka Connect Stateless flows if the processor 
> is not the first one in the flow.
> NIFI-8469 solved this issue for flows similar to the ones mentioned in that 
> jira:
> GetFile --> SplitText --> ReplaceText --> MergeContent --> PutS3Object
> In this scenario GetFile reads a file and creates a FF from it. SplitText 
> creates multiple FFs. ReplaceText processes all the FFs first and 
> MergeContent will be triggered only after ReplaceText finished with all FFs. 
> So it is able to merge the FFs (but only splits coming from the same input 
> file read by GetFile).
> A Kafka Connect Stateless Sink flow may look like this:
> Input port --> "Process/Transform messages" --> MergeContent --> PutS3Object
> The Kafka Connect framework polls some messages from the Kafka topic that 
> will be enqueued in the stateless flow. Then the first processor gets 
> triggered with one FF. This FF is sent downstream till the end of the flow. 
> MergeContent can only see one FF at a time so it cannot merge multiple files.
> So the issue is that the first processor gets triggered for each flowfile 
> separately (which is fine for GetFile but not for the KC Stateless flow). 
> Only subsequent processors get triggered in "triggerWhileReady" way:
> [https://github.com/apache/nifi/blob/18fc492e4ce97d2bca4f96df1a1f1eb2b3e80899/nifi-stateless/nifi-stateless-bundle/nifi-stateless-engine/src/main/java/org/apache/nifi/stateless/flow/StandardStatelessFlowCurrent.java#L68-L77]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to