[ 
https://issues.apache.org/jira/browse/ARROW-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304543#comment-17304543
 ] 

Bob Tinsman commented on ARROW-11901:
-------------------------------------

Hi, noticed that you were working on the LZ4 issue, which I was curious about, 
since Java and performance are both interests of mine.

I am happy to help by profiling code.

[~emkornfield] mentioned airlift as being Java based but still fast, so I 
checked it out.

Its core code uses off-heap access which could explain its speed.

For example, check out the core decompressor code: 
[https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/lz4/Lz4RawDecompressor.java]

This is similar to Arrow's vector implementations, which allocate an off-heap 
chunk of memory, then use Unsafe methods to access it.

> [Java] Investigate potential performance improvement of compression codec
> -------------------------------------------------------------------------
>
>                 Key: ARROW-11901
>                 URL: https://issues.apache.org/jira/browse/ARROW-11901
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Benjamin Wilhelm
>            Priority: Major
>
> In response to the discussion in 
> https://github.com/apache/arrow/pull/8949/files#r588046787
> There are some performance penalties in the implementation of the compression 
> codecs (e.g. data copying between heap/off-heap data). We need to revise the 
> code to improve the performance. 
> We should also provide some benchmarks to validate that the performance 
> actually improves. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to