I'm experimenting with arrow in an online streaming analytics project. It
looks like it could potentially be a great fit but i'm hitting a few
roadblocks.

A goal for my project is to stream data into pre-allocated arrays in memory
and include them in analytics with arrays stored elsewhere (db, disk, etc).
I'm currently hitting a barrier with Array Builders. They can't produce an
array without resetting their memory (of course, correct me if I'm wrong).
This forces me to couple the sizes of my buffers in memory/disk with the
query rate rather than something more logical like time elapsed, bytes,
number of elements.

Ultimately, I'd like to have array builders be able to produce an unsafe
array which shares the same memory buffers. I'd have to protect
resizes/copies/resets with mutexes, but I believe it should work. Is this a
case you'd like to support with Arrow? I have an incomplete PR
<https://github.com/clutchski/arrow/pull/1/files>which illustrates the
idea, which I could complete if you all think it's a reasonable direction.

One other question: The Go implementation seems pretty incomplete. If i
wanted to better understand Arrow's full capabilities, which implementation
is most complete? C++?

Thanks for all the work so far.
Matt

Reply via email to