Tim, There are some changes in 0.3.0 that greatly improve NiFi's performance when dealing withtons of small FlowFiles, like this. That being said, if you are still concerned about the numberof FlowFiles that you are having to process, you can in fact have a single FlowFile that containsmany messages to put to Kafka. The PutKafka processor exposes a "Message Delimiter" property that you can use to tell it how tosplit up the messages in the FlowFile. So if your messages are new-line delimited, for instance, you canuse a new-line as your delimiter and each line in the FlowFile will be sent to Kafka as a separate message. Thanks-Mark
> Date: Sat, 12 Sep 2015 07:53:59 -0400 > Subject: Re: custom processor - parse flowFile to many kafka messages > From: [email protected] > To: [email protected] > > Tim > > Yep. What you're looking to do should be pretty straight forward. > > One potential option if all 'rows' from the text file are line > oriented would be: > > GetFile --> SplitText --> YourCustomProc --> PutKafka. > > SplitText will take the input file, regardless of size, and split it > into a flowfile per line (or lines depending on config). This way > each flowFile entering your custom processor will be one 'row'. Once > your custom processor has done its thing, presumably converting the > input format to your desired JSON output, you can then write these > resulting flowfiles to Kafka. > > Depending on the sophistication of the conversion you may even be able > to avoid creating a custom processor altogether and simply use the > ExtractText and ReplaceText processors in its place. With ExtractText > you can parse out values of your rows into FlowFile attributes and > with ReplaceText you can take flow file attributes and replace the > actual content using the expression language. It gives you a good bit > of power and control but not as much as you can give yourself of > course in a custom processor. > > If you need help with a template showing this or would like to talk > through various peak performance considerations just let us know. > > Thanks > Joe > > On Sat, Sep 12, 2015 at 6:00 AM, Rick Braddy <[email protected]> wrote: > > Tim, > > > > Based on what you describe, and not being familiar with Kafka or your > > application, it sounds like breaking each row into a flowfile could make > > sense, depending upon what you're needing to do downstream. There is > > overhead associated with each FlowFile, as well as a provenance > > consideration for what level of granularity you want for the flows. If > > there's a more logical way to group multiple JSON objects together as > > multiple rows that may be more efficient. > > > > For throughput reasons, if you have a huge number of rows converting to > > separate flowfiles, you may want to consider "batching" flowfile creation > > within your processor (look at how GetFile does this, for example). This > > way, each time your processor's onTrigger method gets called, your > > processor can quickly process and emit NNN number of JSON objects then > > relinquish control. > > > > You said the incoming text file is "very large" - not sure if that's in > > MB's, GB's or TB. Keep in mind that it will have to be read entirely into > > the content repository by GetFile before processing, and then your > > processor will have to deal with streaming that huge file in line by line, > > parsing and creating the JSON objects. Not sure if you can accomplish this > > using the standard Nifi building blocks and expression language, but might > > be possible. > > > > Hope that helps. > > Rick > > > > -----Original Message----- > > From: timF [mailto:[email protected]] > > Sent: Saturday, September 12, 2015 1:50 AM > > To: [email protected] > > Subject: custom processor - parse flowFile to many kafka messages > > > > I need to create a custom processor. > > > > GetFile --> MyProcessor --> PutKafka > > > > The incoming flowFile will be a very large text file. Each row of the file > > will need to be parsed, put into its own json object, and then sent to a > > kafka topic. My question is the following: Do I need to write each JSON > > object to its own output flowFile. That is if the input file contains N > > rows, and I want N messages to show up in the kafka topic, do I create N > > output flowFiles ? > > > > > > > > -- > > View this message in context: > > http://apache-nifi-developer-list.39713.n7.nabble.com/custom-processor-parse-flowFile-to-many-kafka-messages-tp2782.html > > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
