[
https://issues.apache.org/jira/browse/NIFI-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582162#comment-14582162
]
Michael Moser commented on NIFI-378:
------------------------------------
I think this will help understand what I saw. If merging 3 files whose
fragment.index attribute is:
file #1 fragment.index=1
file #2 fragment.index=3
file #3 fragment.index=2
Then it will sort the files based on fragment.index and put the files in
correct order (file #1 then file #3 then file #2).
For my use case, I had 1 processor generating fragment.index=1 and sending to
MergeContent. I had another processor generating fragment.index=2 and sending
to MergeContent. So if MergeContent sees these files:
file #1 fragment.index=1
file #2 fragment.index=1
file #3 fragment.index=1
file #4 fragment.index=2
file #5 fragment.index=2
file #6 fragment.index=2
I expected file #1 and file #4 to be merged, then file #2 and file #5, then
file #3 and file #6. However, the processor actually merged file #1 and file
#2, then file #3 and file #4, then file #5 and file #6.
Your understanding of our use case is correct. Maybe this is a violation of
the contract of this processor. I just didn't understand the contract and my
expectations were incorrect. But when MergeContent does a merge, it does not
actually check that all fragment.index attributes are unique, it just sorts
them.
> MergeContent in Defragment mode will merge fragments without checking index
> ---------------------------------------------------------------------------
>
> Key: NIFI-378
> URL: https://issues.apache.org/jira/browse/NIFI-378
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 0.0.1
> Reporter: Michael Moser
> Priority: Minor
>
> When in Defragment mode, the MergeContent processor looks for
> fragment.identifier and fragment.count attributes in order to place FlowFiles
> in the correct bin. The fragment.index attribute is ignored.
> If you happen to have many FlowFile in the queue to MergeContent, and they
> all have fragment.identifier=foo and fragment.count=2, then it will merge two
> FlowFiles that have fragment.index=1 or it will merge two FlowFiles that have
> fragment.index=2.
> Granted this may seem odd. The use case is to give the MergeContent
> processor two input queues. We configure one queue to contain files with
> fragment.index=1 and the other queue to contain files with fragment.index=2.
> We want one file from each queue to be merged. Instead it will merge two
> files from the same queue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)