[ 
https://issues.apache.org/jira/browse/NIFI-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774055#comment-17774055
 ] 

Matt Burgess commented on NIFI-11789:
-------------------------------------

That is correct. Although the PR fixes your use case, I believe it would take a 
significant refactor in order to keep track of the FlowFile "groups" such that 
we know how many FlowFiles are in the group by the time the session gets 
committed (versus the different points in code where the FlowFiles are 
transferred within the session, they are not sent until the session is 
committed). For your use case it was more straightforward, but there are cases 
depending on the configuration of the processor where different FlowFiles are 
transferred at different points and the session is committed later. After the 
FlowFile has been transferred it cannot be altered (i.e. the fragment.count 
attribute cannot be set)

If someone wants to start with my PR and has a better approach, or wants to 
start from scratch, I encourage them to do so. Also if I feel like revisiting 
it with fresh eyes down the road, I may do that.

As a workaround / solution, if the outgoing FlowFiles have the correct number 
of records in them (meaning the record count is accurate for what the 
fragment.count attribute SHOULD be and the fragment.index attribute values are 
correct), you can pass the FlowFiles through a CalculateRecordStats processor 
then UpdateAttribute to set "fragment.count" to the value of "record.count", 
and the merge should work downstream.

> ExecuteSQL doesn't set fragment.count attribute when Output Batch Size is set
> -----------------------------------------------------------------------------
>
>                 Key: NIFI-11789
>                 URL: https://issues.apache.org/jira/browse/NIFI-11789
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>            Reporter: Tamas Neumer
>            Assignee: Matt Burgess
>            Priority: Minor
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi,
> I am working with the ExecuteSQL processor and discovered an unexpected 
> behavior. If I specify the attribute "Output Batch Size", I get the 
> fragment.index on the outflowing flowing Flowfiles, but the fragment.count 
> attribute is not set (according to the documentation).
> The behavior I would expect (in line with how merge processors work) is that 
> the attribute fragment.count is just set at the last Flowfile for the batch. 
> This would make it possible to merge all the batches together afterward.
> So my proposal, in short, is that the fragment.count should be set in the 
> last Flowfile of a batch. 
> BR Florian



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to