[ 
https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ji Liu closed ARROW-5224.
-------------------------
    Resolution: Won't Do
      Assignee:     (was: Ji Liu)

> [Java] Add APIs for supporting directly serialize/deserialize ValueVector
> -------------------------------------------------------------------------
>
>                 Key: ARROW-5224
>                 URL: https://issues.apache.org/jira/browse/ARROW-5224
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Ji Liu
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> There is no API to directly serialize/deserialize ValueVector. The only way 
> to implement this is to put a single FieldVector in VectorSchemaRoot and 
> convert it to ArrowRecordBatch, and the deserialize process is as well. 
> Provide a utility class to implement this may be better, I know all 
> serializations should follow IPC format so that data can be shared between 
> different Arrow implementations. But for users who only use Java API and want 
> to do some further optimization, this seem to be no problem and we could 
> provide them a more option.
> This may take some benefits for Java user who only use ValueVector rather 
> than IPC series classes such as ArrowReordBatch:
>  * We could do some shuffle optimization such as compression and some 
> encoding algorithm for numerical type which could greatly improve performance.
>  * Do serialize/deserialize with the actual buffer size within vector since 
> the buffer size is power of 2 which is actually bigger than it really need.
>  * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it 
> user-friendly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to