Hello Team,

While arrow-format/flatbuffers does support zero copy, current arrow-vector 
java implementation is not. I was trying to hack zero copy for readonly 
scenarios, but saw two main blockers:

1. ArrowBuf is the only buffer implementation used exclusively across 
ArrowReader/ArrowRecordBatch/Vectors. It's final, which means there isn't a way 
for me to override its logic in order to wrap some existing buffer. It's 
absolutely necessary to use ArrowBuf for write scenarios, but for read, if I 
already have data in ByteBuffer or netty ByteBuf, I hope to avoid the copy that 
creates ArrowBuf out of them (I was pretty much referring to deserialization 
logic inside MessageSerializer).

2. As a result of #1 described above, the only layer which seems reusable is 
the arrow-format. Then I have to implement effectively a readonly copy of 
arrow-vector that references existing buffer. Put aside the effort doing that, 
it introduces a big gap to keep up with future changes/fixes made to 
arrow-vector.

Wondering if you guys have put any thoughts into such readonly scenarios. Any 
suggestion how I can approach this myself?

Thanks.


[ Full content available at: https://github.com/apache/arrow/issues/2516 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to