I agree that this would be a useful general feature. I also agree with
Joe that format support should be limited to Version 3 due to the
limitations of the earlier versions.

This is definitely something that would be useful on the 1.x support
branch to provide a smooth upgrade path for NiFi 2.

This general topic also came up on the dev channel on the Apache NiFi
Slack group:

https://apachenifi.slack.com/archives/C0L9S92JY/p1692115270146369

One key thing to note from that discussion is supporting
interoperability with services outside of NiFi. That may be too much
of a stretch for an initial implementation, but it is something I am
planning to evaluate as time allows.

For now, something focused narrowly on FlowFile Version 3 encoding
seems like the best approach.

I recommend referencing this discussion in a new Jira issue and
outlining the general design goals.

Regards,
David Handermann


On Fri, Sep 8, 2023 at 1:11 PM Adam Taft <a...@adamtaft.com> wrote:
>
> And also ... if we can land this in a 1.x release, this would help
> tremendously to those who are going to need a replacement for PostHTTP and
> don't want to "go dark" when they make the transition.
>
> That is, without this processor in 1.x, when a user upgrades from 1.x to
> 2.x, they will either have to have a MergeContent/InvokeHTTP solution in
> place already to replace PostHTTP, or they will have to take a (hopefully
> short) outage when they bring their canvas back up (removing PostHTTP and
> replacing with PackageFlowFile + InvokeHTTP).
>
> With this processor in 1.x, they can make that transition while PostHTTP is
> still available on their canvas. Wishful thinking that we can make the
> entire journey from 1.x to 2.x as smooth as possible, but this could
> potentially help some.
>
>
> On Fri, Sep 8, 2023 at 10:55 AM Adam Taft <a...@adamtaft.com> wrote:
>
> > +1 on this as well. It's something I've kind of griped about before (with
> > the loss of PostHTTP).
> >
> > I don't think it would be horrible (as per Joe's concern) to offer a N:1
> > "bundling" property. It would just have to be stupid simple. No "groups",
> > timeouts, correlation attributes, minimum entries, etc. It should just
> > basically call the ProcessSession#get(int maxResults) where "maxResults" is
> > a configurable property. Whatever number of flowfiles returned in the list
> > is what is "bundled" into FFv3 format for output.
> >
> > /Adam
> >
> >
> > On Fri, Sep 8, 2023 at 7:19 AM Phillip Lord <phillord0...@gmail.com>
> > wrote:
> >
> >> +1 from me.
> >> I’ve experimented with both methods.  The simplicity of a PackageFlowfile
> >> straight up 1:1 is convenient and straightforward.
> >> MergeContent on the other hand can be difficult to understand and tweak
> >> appropriately to gain desired results/throughput.
> >> On Sep 8, 2023 at 10:14 AM -0400, Joe Witt <joe.w...@gmail.com>, wrote:
> >> > Ok. Certainly simplifies it but likely makes it applicable to larger
> >> > flowfiles only. The format is meant to allow appending and result in
> >> large
> >> > sets of flowfiles for io efficiency and specifically for storage as the
> >> > small files/tons of files thing can cause poor performance pretty
> >> quickly
> >> > (10s of thousands of files in a single directory).
> >> >
> >> > But maybe that simplicity is fine and we just link to the MergeContent
> >> > packaging option if users need more.
> >> >
> >> > On Fri, Sep 8, 2023 at 7:06 AM Michael Moser <moser...@gmail.com>
> >> wrote:
> >> >
> >> > > I was thinking 1 file in -> 1 flowfile-v3 file out. No merging of
> >> multiple
> >> > > files at all. Probably change the mime.type attribute. It might not
> >> even
> >> > > have any config properties at all if we only support flowfile-v3 and
> >> not v1
> >> > > or v2.
> >> > >
> >> > > -- Mike
> >> > >
> >> > >
> >> > > On Fri, Sep 8, 2023 at 9:56 AM Joe Witt <joe.w...@gmail.com> wrote:
> >> > >
> >> > > > Mike
> >> > > >
> >> > > > In user terms this makes sense to me. Id only bother with v3 or
> >> whatever
> >> > > is
> >> > > > latest. We want to dump the old code. And if there are seriously
> >> older
> >> > > > versions v1,v2 then nifi 1.x can be used.
> >> > > >
> >> > > > The challenge is that you end up needing some of the same
> >> complexity in
> >> > > > implementation and config of merge content i think. What did you
> >> have in
> >> > > > mind for that?
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > > On Fri, Sep 8, 2023 at 6:53 AM Michael Moser <moser...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Devs,
> >> > > > >
> >> > > > > I can't find if this was suggested before, so here goes. With the
> >> > > demise
> >> > > > > of PostHTTP in NiFi 2.0, the recommended alternative is to
> >> > > MergeContent 1
> >> > > > > file into FlowFile-v3 format then InvokeHTTP. What does the
> >> community
> >> > > > > think about supporting a new PackageFlowFile processor that is
> >> simple
> >> > > to
> >> > > > > configure (compared to MergeContent!) and simply packages flowfile
> >> > > > > attributes + content into a FlowFile-v[1,2,3] format? This would
> >> also
> >> > > > > offer a simple way to export flowfiles from NiFi that could later
> >> be
> >> > > > > re-ingested and recovered using UnpackContent. I don't want to
> >> submit
> >> > > a
> >> > > > PR
> >> > > > > for such a processor without first asking the community whether
> >> this
> >> > > > would
> >> > > > > be acceptable.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > -- Mike
> >> > > > >
> >> > > >
> >> > >
> >>
> >

Reply via email to