[ 
https://issues.apache.org/jira/browse/ARROW-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835321#comment-16835321
 ] 

Ji Liu commented on ARROW-5207:
-------------------------------

[~jnadeau] Thanks for your feedback. Sure, it's not an important optimization. 
However, I think users have different usage scenarios for Arrow, for example, 
they may use it in ArrowBuf or ValueVector level rather than ArrowRecordBatch. 
Maybe it's better to give more options, if you think break the design of 
ValueVector, at least I think we should provide a utility class to support 
operations like serialize ValueVector, resize buffers mentioned above. What do 
you think?

 

> [Java] add APIs to support vector reuse
> ---------------------------------------
>
>                 Key: ARROW-5207
>                 URL: https://issues.apache.org/jira/browse/ARROW-5207
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Ji Liu
>            Assignee: Ji Liu
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In some scenarios we hope that ValueVector could be reused to reduce creation 
> overhead. This is very common in shuffle stage, it's no need to create 
> ValueVector or realloc buffers every time, suppose that the recordCount of 
> ValueVector and capacity of its buffers is written in stream, when we 
> deserialize it, we can simply judge whether realloc is needed through 
> dataLength.
> My proposal is that add APIs in ValueVector to process this logic, otherwise 
> users have to implement by themselves if they want to reuse which is not 
> user-friendly. 
> If you agree with this, I would like to take this ticket. Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to