Hello,

my first mailing here. I am a Java developer, using Apache Velocity, Drill, 
Tomcat, Ant, Pentaho ETL, MongoDb, Mysql and more and I am very much a data guy.

I have used Nifi for a while now and started yesterday of coding my first 
processor. I basically do it to widen my knowledge and learn something new.

I started with the idea of combining Apache Velocity - a template engine - with 
Nifi. So in comes a CSV file, it gets merged with a template containing 
formatting information and some placeholders (and some limited logic maybe) and 
out comes a new set of data, formatted differently. So it separates the 
processing logic from the formatting. One could create HTML, XML, Json or other 
text based formats from it. Easy to use and very efficient.

Now my question is: Should I rather implement the logic this way that I process 
a whole CSV file - which usually has multiple lines? That would be good for the 
user as he or she has to deal with only one processor doing the work. But the 
logic would be more specialized.

The other way around, I could code the processor to handle one row of the CSV 
file and the user will have to come up with a flow that divides the CSV file 
into multiple flowfiles before my processor can be used. That is not so 
specialized but it requires more preparation work from the user.

I tend to go the second way. Also because there is already a processor that 
will split a file into multiple flowfiles. But I wanted to hear your opinion of 
what is the best way to go. Do you have a recommendation for me? (Maybe the 
answer is to do both?!)

Thanks for sharing your thoughts.

Uwe

Reply via email to