Mike, Regarding the licensing, I believe LGPL is a no-go for Apache projects.
Take a look here: https://www.apache.org/legal/resolved.html#category-x -Bryan On Sat, Oct 28, 2017 at 4:47 PM, Mike Thomsen <mikerthom...@gmail.com> wrote: > The processor breaks down a much larger file into a huge number of small > data points. We're talking like turning a 1.1M line file into about 2.5B > data points. > > My current approach is "read a file with GetFile, save to /tmp, break down > into a bunch of large CSV record batches (like a few hundred thousand > records per group)" and then commit. > > It's slow, and with some good debugging statements, I can see the processor > tearing into the data just fine. However, I am thinking about adding a > variant to this which would be an "iterative" version that would follow > this pattern: > > "read the file, save to /tmp, load the file, keep the current read position > intact, every onTrigger call sends out a batch w/ session.commit() until > it's done reading. Then grab the next flowfile." > > Does anyone have any suggestions on good practices to follow here, > potential concerns, etc.? (Note: I have to write the file to /tmp because a > library I am using which I don't want to fork doesn't have an API that can > read from a stream rather than a java.io.File) > > Also, are there any issues with accepting a contribution that makes use of > a LGPL-licensed library, in the event that my client wants to open source > it (we think they will)? > > Thanks, > > Mike