[ 
https://issues.apache.org/jira/browse/NIFI-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807161#comment-16807161
 ] 

ASF subversion and git services commented on NIFI-5918:
-------------------------------------------------------

Commit e5ddae54efe229a2eb033a694b6c82c3ebf62018 in nifi's branch 
refs/heads/NIFI-6169-RC1 from Koji Kawamura
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=e5ddae5 ]

NIFI-5918 Fix issue with MergeRecord when DefragmentStrategy is on

Added an unit test representing the fixed issue.
And updated existing testDefragment test to illustrate
the remaining FlowFiles those did not meet the threshold.


> MergeRecord works wrong with Defragment strategy
> ------------------------------------------------
>
>                 Key: NIFI-5918
>                 URL: https://issues.apache.org/jira/browse/NIFI-5918
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.8.0
>            Reporter: Alexander Bukarev
>            Assignee: Alexander Bukarev
>            Priority: Major
>             Fix For: 1.10.0, 1.9.2
>
>         Attachments: NIFI-5918_MergeRecord.xml
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Steps*
> # Create the simple flow: 
> #* {{GenerateFlowFile}} (with constant payload "txt1,txt2" and 10 secs 
> schedulling) 
> #* -> {{SplitContent}} (with comma as a separator)
> #* -> some chain of processors which get "txt1" and "txt2" as a inbound 
> params and produce flowfiles with more than 1 record ((!) that's important). 
> For example, I use {{ExtractText}} (to get "txt1" and "txt2" as an 
> attribute), then {{ExecuteSQLRecord}} (to execute SQL using "txt1" and "txt2" 
> as a parameter)
> #* -> {{MergeRecord}} (with *Defragment* merge strategy - (!) that's 
> important)
> #* -> {{LogAttribute}} or whatever you prefer to observe the merge result
> # Now just run the flow
> *Result:* we'll see an error in logs like {panel}Could not merge bin with 1 
> FlowFiles because of the 'fragment.count' attribute had a value of '2' but 
> only 1 of 2 FlowFiles were encountered before this bin was evicted (due to to 
> Max Bin Age being reached or due to the Maximum Number of Bins being 
> exceeded).{panel}
> *Expected result:* the flow file containing records from both SQL queries 
> (for "txt1" and "txt2")
> The cause is {{RecordBinManager}} uses {{fragment.count}} flow file attribute 
> to calculate required *record* number to release the bin. However, the 
> attribute contains the number of *flow files* instead. As in above scenario 
> each file contains more than 1 records (at least 2) that means {{RecordBin}} 
> thinks the bin is "full enough" when first flow file arrives (because it 
> contains >= 2 records and {{fragment.count}} is equal to 2 in the scenario). 
> So the bin is released wrongly.
> I think there is a mistake and in *Defragment* mode we are interested in a 
> number of flow files and never in records number. In opposite, we should care 
> about a number of records usin Bin-Packaging Algorithm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to