[
https://issues.apache.org/jira/browse/NIFI-10081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572043#comment-17572043
]
Joe Witt commented on NIFI-10081:
---------------------------------
[~eneveu] Yeah that makes perfect sense and is a super common case/pattern.
What I'm suggesting though is schedule MergeContent to run all the time.
As is always the case with creating batches of data there are various
characteristics one might want to use to determine a batch is 'ready'. We
support any or all of the following at once:
- Minimum Number of Entries
- Maximum Number of Entries
- Min size of the combined entires
- Max size of the combined entries
- Max Bin Age
In your case I'd suggest you likely have a max size you'd like to see. For
instance you want no more than 10GB of data batched together or something else.
And certainly you want the batch to go out once it is 20 minutes old so for
that use 'Max Bin Age' of 20 mins.
This will let the processor behave and bundle as designed and give you your
desired 20 minute batches (or less if there was a ton of data that met a size
or number of entries threshold).
..... Now having said this presuming you're using ConsumeKafkaRecord (which you
should be/want to be most likely) then you should pair this with MergeRecord.
Similar otions/logic apply.
Long story short - this is a common, powerful, important use case. You have
great options to choose from here that achieve high performance and reliable
outcomes.
Thanks
> MergeContent processor is not executed when Scheduling Strategy is set to
> Cron Driven.
> --------------------------------------------------------------------------------------
>
> Key: NIFI-10081
> URL: https://issues.apache.org/jira/browse/NIFI-10081
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.16.1, 1.16.2
> Environment: RHEL 7.9, Linux 3.10.0-1160.21.1.el7.x86_64
> java version "1.8.0_331"
> Java(TM) SE Runtime Environment (build 1.8.0_331-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.331-b09, mixed mode)
> Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
> 6 cores
> Reporter: Angel Oropeza
> Priority: Minor
> Labels: crondriven, merge-content, scheduling
> Attachments: image-2022-06-01-22-22-32-958.png,
> image-2022-06-01-22-33-10-811.png, image-2022-06-01-22-33-47-519.png
>
>
> I was able to replicate my problem using the following configuration:
> !image-2022-06-01-22-22-32-958.png!
> The MergeContent configuration is as follows
> !image-2022-06-01-22-33-10-811.png!
> !image-2022-06-01-22-33-47-519.png!
> Although the conditions of the MergeContent processor are met, it does not
> concatenate the incoming flowfiles.
> The last version in which the above flow worked was version 1.15.3.
> Any suggestions on how to solve this? Is this a bug?
> P.S.: The flow is so designed due to a dependency on only writing one file
> per day.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)