Tim,

Based on what you describe, and not being familiar with Kafka or your 
application, it sounds like breaking each row into a flowfile could make sense, 
depending upon what you're needing to do downstream.  There is overhead 
associated with each FlowFile, as well as a provenance consideration for what 
level of granularity you want for the flows.  If there's a more logical way to 
group multiple JSON objects together as multiple rows that may be more 
efficient.

For throughput reasons, if you have a huge number of rows converting to 
separate flowfiles, you may want to consider "batching" flowfile creation 
within your processor (look at how GetFile does this, for example).  This way, 
each time your processor's onTrigger method gets called, your processor can 
quickly process and emit NNN number of JSON objects then relinquish control.

You said the incoming text file is "very large" - not sure if that's in MB's, 
GB's or TB. Keep in mind that it will have to be read entirely into the content 
repository by GetFile before processing, and then your processor will have to 
deal with streaming that huge file in line by line, parsing and creating the 
JSON objects.  Not sure if you can accomplish this using the standard Nifi 
building blocks and expression language, but might be possible.

Hope that helps.
Rick

-----Original Message-----
From: timF [mailto:[email protected]] 
Sent: Saturday, September 12, 2015 1:50 AM
To: [email protected]
Subject: custom processor - parse flowFile to many kafka messages

I need to create a custom processor.  

GetFile --> MyProcessor --> PutKafka

The incoming flowFile will be a very large text file. Each row of the file will 
need to be parsed, put into its own json object, and then sent to a kafka 
topic.  My question is the following: Do I need to write each JSON object to 
its own output flowFile.  That is if the input file contains N rows, and I want 
N messages to show up in the kafka topic, do I create N output flowFiles ?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/custom-processor-parse-flowFile-to-many-kafka-messages-tp2782.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to