[
https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838761#comment-16838761
]
Bryan Cutler commented on ARROW-5224:
-------------------------------------
[~tianchen92] could you encode the BigIntVector into a VarBinaryVector as
LEB128 and then serialize that vector as an Arrow RecordBatch?
> [Java] Add APIs for supporting directly serialize/deserialize ValueVector
> -------------------------------------------------------------------------
>
> Key: ARROW-5224
> URL: https://issues.apache.org/jira/browse/ARROW-5224
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Ji Liu
> Assignee: Ji Liu
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> There is no API to directly serialize/deserialize ValueVector. The only way
> to implement this is to put a single FieldVector in VectorSchemaRoot and
> convert it to ArrowRecordBatch, and the deserialize process is as well.
> Provide a utility class to implement this may be better, I know all
> serializations should follow IPC format so that data can be shared between
> different Arrow implementations. But for users who only use Java API and want
> to do some further optimization, this seem to be no problem and we could
> provide them a more option.
> This may take some benefits for Java user who only use ValueVector rather
> than IPC series classes such as ArrowReordBatch:
> * We could do some shuffle optimization such as compression and some
> encoding algorithm for numerical type which could greatly improve performance.
> * Do serialize/deserialize with the actual buffer size within vector since
> the buffer size is power of 2 which is actually bigger than it really need.
> * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it
> user-friendly.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)