Re: Streaming Splitting Processor

Eric Secules Fri, 05 Mar 2021 13:18:23 -0800

Hi Joe,

For my use case partial results are okay.
The files may contain up to a million records. But we have like a day to
process it. We will consider record-based processing. It might be a longer
task to convert our flows to consume records instead of single files.
Will I need to have multiple sessions to handle all this?


Thanks,
Eric

On Fri, Mar 5, 2021 at 12:30 PM Joe Witt <[email protected]> wrote:

> Eric
>
> The ProcessSession follows a unit of work pattern.  You can do a lot
> of things but until you commit the session it wont actually commit the
> change(s).  So if you want the behavior you describe call commit after
> transfer each time.  This is done automatically for you in most cases
> but you can call it to control the boundary.  Just remember you risk
> partial results then.  Consider you're reading the input file which
> contains 100 records lets say.  On record 51 there is a processing
> issue.  What happens then?    I'd also suggest this pattern generally
> results in poor performance.  Can you not use the record
> reader/writers to accomplish this so you can avoid turning it into a
> bunch of tiny flowfiles?
>
> Thanks
>
> On Fri, Mar 5, 2021 at 1:19 PM Eric Secules <[email protected]> wrote:
> >
> > Hello,
> >
> > I am trying to write a processor which parses an input file and emits one
> > JSON flowfile for each record in the input file. Currently we're calling
> > session.transfer() once we encounter a fragment we want to emit. But it's
> > not sending the new flowfiles to the next processor as it processes the
> > input flowfile. Instead it's holding everything until the input is fully
> > processed and releasing it all at once. Is there some way I can write the
> > processor to emit flowfiles as soon as possible rather than waiting for
> > everything to succeed?
> >
> > Thanks,
> > Eric
>

Re: Streaming Splitting Processor

Reply via email to