[ 
https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838299#comment-16838299
 ] 

Ji Liu commented on ARROW-5224:
-------------------------------

[[email protected]] Thanks for your reply.

For #2 you are right.

For #1, for example, if we do encoding Int or BigInt type  like 
[https://en.wikipedia.org/wiki/LEB128], we need to read each value and 
reassemble byte, and the deserialize process as well. Can this be achieved by 
existing implementation?

> [Java] Add APIs for supporting directly serialize/deserialize ValueVector
> -------------------------------------------------------------------------
>
>                 Key: ARROW-5224
>                 URL: https://issues.apache.org/jira/browse/ARROW-5224
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Ji Liu
>            Assignee: Ji Liu
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There is no API to directly serialize/deserialize ValueVector. The only way 
> to implement this is to put a single FieldVector in VectorSchemaRoot and 
> convert it to ArrowRecordBatch, and the deserialize process is as well. 
> Provide a utility class to implement this may be better, I know all 
> serializations should follow IPC format so that data can be shared between 
> different Arrow implementations. But for users who only use Java API and want 
> to do some further optimization, this seem to be no problem and we could 
> provide them a more option.
> This may take some benefits for Java user who only use ValueVector rather 
> than IPC series classes such as ArrowReordBatch:
>  * We could do some shuffle optimization such as compression and some 
> encoding algorithm for numerical type which could greatly improve performance.
>  * Do serialize/deserialize with the actual buffer size within vector since 
> the buffer size is power of 2 which is actually bigger than it really need.
>  * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it 
> user-friendly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to