[
https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662247#comment-17662247
]
Rok Mihevc commented on ARROW-5224:
-----------------------------------
This issue has been migrated to [issue
#16718|https://github.com/apache/arrow/issues/16718] on GitHub. Please see the
[migration documentation|https://github.com/apache/arrow/issues/14542] for
further details.
> [Java] Add APIs for supporting directly serialize/deserialize ValueVector
> -------------------------------------------------------------------------
>
> Key: ARROW-5224
> URL: https://issues.apache.org/jira/browse/ARROW-5224
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java
> Reporter: Ji Liu
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 3h
> Remaining Estimate: 0h
>
> There is no API to directly serialize/deserialize ValueVector. The only way
> to implement this is to put a single FieldVector in VectorSchemaRoot and
> convert it to ArrowRecordBatch, and the deserialize process is as well.
> Provide a utility class to implement this may be better, I know all
> serializations should follow IPC format so that data can be shared between
> different Arrow implementations. But for users who only use Java API and want
> to do some further optimization, this seem to be no problem and we could
> provide them a more option.
> This may take some benefits for Java user who only use ValueVector rather
> than IPC series classes such as ArrowReordBatch:
> * We could do some shuffle optimization such as compression and some
> encoding algorithm for numerical type which could greatly improve performance.
> * Do serialize/deserialize with the actual buffer size within vector since
> the buffer size is power of 2 which is actually bigger than it really need.
> * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it
> user-friendly.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)