[ 
https://issues.apache.org/jira/browse/ARROW-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835303#comment-16835303
 ] 

Liya Fan commented on ARROW-5200:
---------------------------------

This is the (source/byte code/assembly) code generated by the original Arrow 
API for Float8Vector. !safe_nocheck.jpg! 

And this is the code generated by the unsafe API for Float8Vector.  
!unsafe.jpg! 

It can be observed that the amount of (source/byte code/assembly) code 
generated by unsafe API is smaller.

> [Java] Provide light-weight arrow APIs
> --------------------------------------
>
>                 Key: ARROW-5200
>                 URL: https://issues.apache.org/jira/browse/ARROW-5200
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2019-04-23-15-19-34-187.png, safe_nocheck.jpg, 
> unsafe.jpg
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We are trying to incorporate Apache Arrow to Apache Flink runtime. We find 
> Arrow an amazing library, which greatly simplifies the support of columnar 
> data format.
> However, for many scenarios, we find the performance unacceptable. Our 
> investigation shows the reason is that, there are too many redundant checks 
> and computations in Arrow API.
> For example, the following figures shows that in a single call to 
> Float8Vector.get(int) method (this is one of the most frequently used APIs in 
> Flink computation),  there are 20+ method invocations.
> !image-2019-04-23-15-19-34-187.png!
>  
> There are many other APIs with similar problems. We believe that these checks 
> will make sure of the integrity of the program. However, it also impacts 
> performance severely. For our evaluation, the performance may degrade by two 
> or three orders of magnitude slower, compared to access data on heap memory. 
> We think at least for some scenarios, we can give the responsibility of 
> integrity check to application owners. If they can be sure all the checks 
> have been passed, we can provide some light-weight APIs and the inherent high 
> performance, to them.
> In the light-weight APIs, we only provide minimum checks, or avoid checks at 
> all. The application owner can still develop and debug their code using the 
> original heavy-weight APIs. Once all bugs have been fixed, they can switch to 
> light-weight APIs in their products and enjoy the consequent high performance.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to