Hello,
I am new to apache nifi (and java, somewhat), so forgive my ignorance.  I want 
to use nifi to process large files. My university has some research work that 
could be written as nifi processors.  I know "large" can be a relative term, 
but for the resources I have, these files are large. About 700mb-1gb.  I *had* 
processors that took these large files and split them into smaller files, and 
then performed our algorithms (which I ported over to java), and then extracted 
some data and finally did almost an "if-else" branching. Seems great for nifi, 
right? Unfortunately I am seeing problems where it seems that nifi is i/o 
bound. I think this is because the walog and provenance log and all the 
transactions that are recorded there after each processor. I then essentially 
combined my processes into a large one, which I know goes against the grain of 
nifi.
I wonder if I should use some of these:
SideEffectFree
SupportsBatching

and maybe I should've used SupportsBatching instead of throwing my code into 
one big processor?
Even if the above does help with the i/o problems, I still like having a large 
nifi processor that does a lot of work. One benefit to using one big processor 
is that I can use the Callback in session.read to load my large files into a 
java.nio.ByteBuffer. A ByteBuffer is useful in my case because I can have my 
data off heap and run my algorithms a chunk (or ".slice()") at a time.  
Hopefully you are familiar with this class. If I had many small processors 
(with the grain), I would have to constantly use session.read and 
InputStreamCallback and read the InputStream which is not as efficient as using 
a ByteBuffer.

If I'm reading things correctly, FlowFileController.getContent returns an 
InputStream, so that's not bad. My concern is that I will have many processors 
(with many threads) reading InputStreams and then having many objects waiting 
to be garbage collected. 


So as you can imagine I am going off into tangents, I was wondering if you have 
ideas to reduce the i/o I'm seeing and what you think of my use of ByteBuffers. 
I wonder if I could pass the ByteBuffer around in a FlowFile?
Thanks,
Ravi

Reply via email to