It is on purpose that the ArrowBuf is final. It is done to ensure a single impl and performance reasons. ArrowBuf is primarily a memory address and a length and wants zero indirection to the reading/writing of that.
It does, however, wrap several types of substructures as long as they have that property. For example, an ArrowBuf almost always currently wraps a Netty UnsafeDirectLittleEndian object. At that level you could propose a way to wrap more types of memory addresses+lengths. On Thu, Sep 6, 2018, 10:26 PM Zhenyuan Zhao <[email protected]> wrote: > Hello Team, > > I'm working on using arrow as intermediate format for transferring columnar > data from server to client. In this case, the client will only need to read > from the format so I would like to avoid any unnecessary copy of the data. > Looking into arrow, while arrow-format/flatbuffers does support zero copy, > current arrow-vector java implementation is not. I was trying to hack zero > copy for readonly scenarios, but saw two main blockers: > > 1. > > ArrowBuf is the only buffer implementation used exclusively across > ArrowReader/ArrowRecordBatch/Vectors. It's final, which means there > isn't a > way for me to override its logic in order to wrap some existing buffer. > It's absolutely necessary to use ArrowBuf for write scenarios due to > buffer > allocation, but for read, I was hoping vector can just serve as view on > top > of existing memory buffer (like java ByteBuffer or netty ByteBuf). Seems > safe for read only case. > 2. > > As a result of #1 <https://github.com/apache/arrow/pull/1> described > above, the only layer which seems reusable is the arrow-format. Then I > have > to implement effectively a readonly copy of arrow-vector that references > existing buffer. Put aside the effort doing that, it introduces a big > gap > to keep up with future changes/fixes made to arrow-vector. > > Wondering if you guys have put any thoughts into such readonly scenarios. > Any suggestion how I can approach this myself? > > Thanks >
