I agree that this would be a useful general feature. I also agree with Joe that format support should be limited to Version 3 due to the limitations of the earlier versions.
This is definitely something that would be useful on the 1.x support branch to provide a smooth upgrade path for NiFi 2. This general topic also came up on the dev channel on the Apache NiFi Slack group: https://apachenifi.slack.com/archives/C0L9S92JY/p1692115270146369 One key thing to note from that discussion is supporting interoperability with services outside of NiFi. That may be too much of a stretch for an initial implementation, but it is something I am planning to evaluate as time allows. For now, something focused narrowly on FlowFile Version 3 encoding seems like the best approach. I recommend referencing this discussion in a new Jira issue and outlining the general design goals. Regards, David Handermann On Fri, Sep 8, 2023 at 1:11 PM Adam Taft <a...@adamtaft.com> wrote: > > And also ... if we can land this in a 1.x release, this would help > tremendously to those who are going to need a replacement for PostHTTP and > don't want to "go dark" when they make the transition. > > That is, without this processor in 1.x, when a user upgrades from 1.x to > 2.x, they will either have to have a MergeContent/InvokeHTTP solution in > place already to replace PostHTTP, or they will have to take a (hopefully > short) outage when they bring their canvas back up (removing PostHTTP and > replacing with PackageFlowFile + InvokeHTTP). > > With this processor in 1.x, they can make that transition while PostHTTP is > still available on their canvas. Wishful thinking that we can make the > entire journey from 1.x to 2.x as smooth as possible, but this could > potentially help some. > > > On Fri, Sep 8, 2023 at 10:55 AM Adam Taft <a...@adamtaft.com> wrote: > > > +1 on this as well. It's something I've kind of griped about before (with > > the loss of PostHTTP). > > > > I don't think it would be horrible (as per Joe's concern) to offer a N:1 > > "bundling" property. It would just have to be stupid simple. No "groups", > > timeouts, correlation attributes, minimum entries, etc. It should just > > basically call the ProcessSession#get(int maxResults) where "maxResults" is > > a configurable property. Whatever number of flowfiles returned in the list > > is what is "bundled" into FFv3 format for output. > > > > /Adam > > > > > > On Fri, Sep 8, 2023 at 7:19 AM Phillip Lord <phillord0...@gmail.com> > > wrote: > > > >> +1 from me. > >> I’ve experimented with both methods. The simplicity of a PackageFlowfile > >> straight up 1:1 is convenient and straightforward. > >> MergeContent on the other hand can be difficult to understand and tweak > >> appropriately to gain desired results/throughput. > >> On Sep 8, 2023 at 10:14 AM -0400, Joe Witt <joe.w...@gmail.com>, wrote: > >> > Ok. Certainly simplifies it but likely makes it applicable to larger > >> > flowfiles only. The format is meant to allow appending and result in > >> large > >> > sets of flowfiles for io efficiency and specifically for storage as the > >> > small files/tons of files thing can cause poor performance pretty > >> quickly > >> > (10s of thousands of files in a single directory). > >> > > >> > But maybe that simplicity is fine and we just link to the MergeContent > >> > packaging option if users need more. > >> > > >> > On Fri, Sep 8, 2023 at 7:06 AM Michael Moser <moser...@gmail.com> > >> wrote: > >> > > >> > > I was thinking 1 file in -> 1 flowfile-v3 file out. No merging of > >> multiple > >> > > files at all. Probably change the mime.type attribute. It might not > >> even > >> > > have any config properties at all if we only support flowfile-v3 and > >> not v1 > >> > > or v2. > >> > > > >> > > -- Mike > >> > > > >> > > > >> > > On Fri, Sep 8, 2023 at 9:56 AM Joe Witt <joe.w...@gmail.com> wrote: > >> > > > >> > > > Mike > >> > > > > >> > > > In user terms this makes sense to me. Id only bother with v3 or > >> whatever > >> > > is > >> > > > latest. We want to dump the old code. And if there are seriously > >> older > >> > > > versions v1,v2 then nifi 1.x can be used. > >> > > > > >> > > > The challenge is that you end up needing some of the same > >> complexity in > >> > > > implementation and config of merge content i think. What did you > >> have in > >> > > > mind for that? > >> > > > > >> > > > Thanks > >> > > > > >> > > > On Fri, Sep 8, 2023 at 6:53 AM Michael Moser <moser...@gmail.com> > >> wrote: > >> > > > > >> > > > > Devs, > >> > > > > > >> > > > > I can't find if this was suggested before, so here goes. With the > >> > > demise > >> > > > > of PostHTTP in NiFi 2.0, the recommended alternative is to > >> > > MergeContent 1 > >> > > > > file into FlowFile-v3 format then InvokeHTTP. What does the > >> community > >> > > > > think about supporting a new PackageFlowFile processor that is > >> simple > >> > > to > >> > > > > configure (compared to MergeContent!) and simply packages flowfile > >> > > > > attributes + content into a FlowFile-v[1,2,3] format? This would > >> also > >> > > > > offer a simple way to export flowfiles from NiFi that could later > >> be > >> > > > > re-ingested and recovered using UnpackContent. I don't want to > >> submit > >> > > a > >> > > > PR > >> > > > > for such a processor without first asking the community whether > >> this > >> > > > would > >> > > > > be acceptable. > >> > > > > > >> > > > > Thanks, > >> > > > > -- Mike > >> > > > > > >> > > > > >> > > > >> > >