Hi Joe, For my use case partial results are okay. The files may contain up to a million records. But we have like a day to process it. We will consider record-based processing. It might be a longer task to convert our flows to consume records instead of single files. Will I need to have multiple sessions to handle all this?
Thanks, Eric On Fri, Mar 5, 2021 at 12:30 PM Joe Witt <[email protected]> wrote: > Eric > > The ProcessSession follows a unit of work pattern. You can do a lot > of things but until you commit the session it wont actually commit the > change(s). So if you want the behavior you describe call commit after > transfer each time. This is done automatically for you in most cases > but you can call it to control the boundary. Just remember you risk > partial results then. Consider you're reading the input file which > contains 100 records lets say. On record 51 there is a processing > issue. What happens then? I'd also suggest this pattern generally > results in poor performance. Can you not use the record > reader/writers to accomplish this so you can avoid turning it into a > bunch of tiny flowfiles? > > Thanks > > On Fri, Mar 5, 2021 at 1:19 PM Eric Secules <[email protected]> wrote: > > > > Hello, > > > > I am trying to write a processor which parses an input file and emits one > > JSON flowfile for each record in the input file. Currently we're calling > > session.transfer() once we encounter a fragment we want to emit. But it's > > not sending the new flowfiles to the next processor as it processes the > > input flowfile. Instead it's holding everything until the input is fully > > processed and releasing it all at once. Is there some way I can write the > > processor to emit flowfiles as soon as possible rather than waiting for > > everything to succeed? > > > > Thanks, > > Eric >
