Mike,

Regarding the licensing, I believe LGPL is a no-go for Apache projects.

Take a look here:
https://www.apache.org/legal/resolved.html#category-x

-Bryan


On Sat, Oct 28, 2017 at 4:47 PM, Mike Thomsen <mikerthom...@gmail.com> wrote:
> The processor breaks down a much larger file into a huge number of small
> data points. We're talking like turning a 1.1M line file into about 2.5B
> data points.
>
> My current approach is "read a file with GetFile, save to /tmp, break down
> into a bunch of large CSV record batches (like a few hundred thousand
> records per group)" and then commit.
>
> It's slow, and with some good debugging statements, I can see the processor
> tearing into the data just fine. However, I am thinking about adding a
> variant to this which would be an "iterative" version that would follow
> this pattern:
>
> "read the file, save to /tmp, load the file, keep the current read position
> intact, every onTrigger call sends out a batch w/ session.commit() until
> it's done reading. Then grab the next flowfile."
>
> Does anyone have any suggestions on good practices to follow here,
> potential concerns, etc.? (Note: I have to write the file to /tmp because a
> library I am using which I don't want to fork doesn't have an API that can
> read from a stream rather than a java.io.File)
>
> Also, are there any issues with accepting a contribution that makes use of
> a LGPL-licensed library, in the event that my client wants to open source
> it (we think they will)?
>
> Thanks,
>
> Mike

Reply via email to