Hello, I am new to apache nifi (and java, somewhat), so forgive my ignorance. I want to use nifi to process large files. My university has some research work that could be written as nifi processors. I know "large" can be a relative term, but for the resources I have, these files are large. About 700mb-1gb. I *had* processors that took these large files and split them into smaller files, and then performed our algorithms (which I ported over to java), and then extracted some data and finally did almost an "if-else" branching. Seems great for nifi, right? Unfortunately I am seeing problems where it seems that nifi is i/o bound. I think this is because the walog and provenance log and all the transactions that are recorded there after each processor. I then essentially combined my processes into a large one, which I know goes against the grain of nifi. I wonder if I should use some of these: SideEffectFree SupportsBatching
and maybe I should've used SupportsBatching instead of throwing my code into one big processor? Even if the above does help with the i/o problems, I still like having a large nifi processor that does a lot of work. One benefit to using one big processor is that I can use the Callback in session.read to load my large files into a java.nio.ByteBuffer. A ByteBuffer is useful in my case because I can have my data off heap and run my algorithms a chunk (or ".slice()") at a time. Hopefully you are familiar with this class. If I had many small processors (with the grain), I would have to constantly use session.read and InputStreamCallback and read the InputStream which is not as efficient as using a ByteBuffer. If I'm reading things correctly, FlowFileController.getContent returns an InputStream, so that's not bad. My concern is that I will have many processors (with many threads) reading InputStreams and then having many objects waiting to be garbage collected. So as you can imagine I am going off into tangents, I was wondering if you have ideas to reduce the i/o I'm seeing and what you think of my use of ByteBuffers. I wonder if I could pass the ByteBuffer around in a FlowFile? Thanks, Ravi
