[
https://issues.apache.org/jira/browse/ARROW-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304543#comment-17304543
]
Bob Tinsman commented on ARROW-11901:
-------------------------------------
Hi, noticed that you were working on the LZ4 issue, which I was curious about,
since Java and performance are both interests of mine.
I am happy to help by profiling code.
[~emkornfield] mentioned airlift as being Java based but still fast, so I
checked it out.
Its core code uses off-heap access which could explain its speed.
For example, check out the core decompressor code:
[https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/lz4/Lz4RawDecompressor.java]
This is similar to Arrow's vector implementations, which allocate an off-heap
chunk of memory, then use Unsafe methods to access it.
> [Java] Investigate potential performance improvement of compression codec
> -------------------------------------------------------------------------
>
> Key: ARROW-11901
> URL: https://issues.apache.org/jira/browse/ARROW-11901
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java
> Reporter: Liya Fan
> Assignee: Benjamin Wilhelm
> Priority: Major
>
> In response to the discussion in
> https://github.com/apache/arrow/pull/8949/files#r588046787
> There are some performance penalties in the implementation of the compression
> codecs (e.g. data copying between heap/off-heap data). We need to revise the
> code to improve the performance.
> We should also provide some benchmarks to validate that the performance
> actually improves.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)