Tim, All operations on FlowFiles are performed along session boundaries. Nothing is passed to the next Processor until the session is committed. This way, the session could be rolled back and everything is restored as it was.
So for your case, it may make sense to try processing the larger file, if you can, without splitting it into individual records. There is another solution, albeit a bit less clean: instead of splitting each FlowFile into 1-line FlowFiles, you could use SplitText to split each into say 10,000 lines. Then, you could use another SplitText to split each of those into 1 line each. This way, you can avoid having millions of FlowFiles buffered up all at once. Hope this helps! -Mark ---------------------------------------- > Date: Sun, 13 Sep 2015 20:23:45 -0700 > From: [email protected] > To: [email protected] > Subject: RE: custom processor - parse flowFile to many kafka messages > > Thanks for all the feedback. Looking at the source code for SplitText, I see > that it parses the input FlowFile, storing the created output FlowFiles in a > list, and then at the end sends the list all at once with a single call to > session.transfer(). This could be a problem when there are millions of > records in the input file. > > Is there a technical reason why SplitText creates all the output flow files > before sending them out? If I were to write my own split process, or a > combination of GetFile and SplitText where I read the input file line by > line, can I create an output flow file, send it out, then create the next > one, send it out, etc? > > Does the next processor in the flow get the flow file as soon as it is sent > with session.transfer? > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/custom-processor-parse-flowFile-to-many-kafka-messages-tp2782p2803.html > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
