Tim

Yep.  What you're looking to do should be pretty straight forward.

One potential option if all 'rows' from the text file are line
oriented would be:

GetFile --> SplitText --> YourCustomProc --> PutKafka.

SplitText will take the input file, regardless of size, and split it
into a flowfile per line (or lines depending on config).  This way
each flowFile entering your custom processor will be one 'row'.  Once
your custom processor has done its thing, presumably converting the
input format to your desired JSON output, you can then write these
resulting flowfiles to Kafka.

Depending on the sophistication of the conversion you may even be able
to avoid creating a custom processor altogether and simply use the
ExtractText and ReplaceText processors in its place.  With ExtractText
you can parse out values of your rows into FlowFile attributes and
with ReplaceText you can take flow file attributes and replace the
actual content using the expression language.  It gives you a good bit
of power and control but not as much as you can give yourself of
course in a custom processor.

If you need help with a template showing this or would like to talk
through various peak performance considerations just let us know.

Thanks
Joe

On Sat, Sep 12, 2015 at 6:00 AM, Rick Braddy <[email protected]> wrote:
> Tim,
>
> Based on what you describe, and not being familiar with Kafka or your 
> application, it sounds like breaking each row into a flowfile could make 
> sense, depending upon what you're needing to do downstream.  There is 
> overhead associated with each FlowFile, as well as a provenance consideration 
> for what level of granularity you want for the flows.  If there's a more 
> logical way to group multiple JSON objects together as multiple rows that may 
> be more efficient.
>
> For throughput reasons, if you have a huge number of rows converting to 
> separate flowfiles, you may want to consider "batching" flowfile creation 
> within your processor (look at how GetFile does this, for example).  This 
> way, each time your processor's onTrigger method gets called, your processor 
> can quickly process and emit NNN number of JSON objects then relinquish 
> control.
>
> You said the incoming text file is "very large" - not sure if that's in MB's, 
> GB's or TB. Keep in mind that it will have to be read entirely into the 
> content repository by GetFile before processing, and then your processor will 
> have to deal with streaming that huge file in line by line, parsing and 
> creating the JSON objects.  Not sure if you can accomplish this using the 
> standard Nifi building blocks and expression language, but might be possible.
>
> Hope that helps.
> Rick
>
> -----Original Message-----
> From: timF [mailto:[email protected]]
> Sent: Saturday, September 12, 2015 1:50 AM
> To: [email protected]
> Subject: custom processor - parse flowFile to many kafka messages
>
> I need to create a custom processor.
>
> GetFile --> MyProcessor --> PutKafka
>
> The incoming flowFile will be a very large text file. Each row of the file 
> will need to be parsed, put into its own json object, and then sent to a 
> kafka topic.  My question is the following: Do I need to write each JSON 
> object to its own output flowFile.  That is if the input file contains N 
> rows, and I want N messages to show up in the kafka topic, do I create N 
> output flowFiles ?
>
>
>
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/custom-processor-parse-flowFile-to-many-kafka-messages-tp2782.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to