[jira] [Commented] (ARROW-11901) [Java] Investigate potential performance improvement of compression codec

Samuel Audet (Jira) Tue, 26 Oct 2021 21:51:09 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434659#comment-17434659
 ]


Samuel Audet commented on ARROW-11901:
--------------------------------------

[~emkornfield], since the C++ builds of Arrow already include LZ4, it is indeed 
pretty trivial to expose a few JNI methods to access it. The larger picture 
though is that the overall Java API of Arrow itself is still pretty limited and 
inefficient, even after 5 years in development! And there _are_ users such as 
[[email protected]] that require more performance, and that's why 
there are also JavaCPP Presets for the C++ API of Arrow: 
[https://github.com/bytedeco/javacpp-presets/tree/master/arrow]

Now, the C++ API doesn't always map very elegantly to Java, but it is tons 
faster, and maps a lot more functionality. This would be a discussion for 
another thread, but if the Java API of Arrow were to be based on JavaCPP, it 
would allow users to fall back easily on that API, instead of forcing them to 
start hacking stuff in JNI. Case in point, the {{arrow::util::Codec}} class has 
been usable from Java for almost 2 years now:
[https://github.com/bytedeco/javacpp-presets/blob/master/arrow/src/gen/java/org/bytedeco/arrow/Codec.java]

I would be happy to maintain those presets as part of the Arrow project, just 
like I'm currently doing in the case of TensorFlow for Java: 
[https://github.com/tensorflow/java/search?q=javacpp]

Previous discussions with people from Apache Arrow didn't elicit much interest, 
but in time the need for a tool like Cython in Java will become obvious to all, 
and JavaCPP already provides that!

> [Java] Investigate potential performance improvement of compression codec
> -------------------------------------------------------------------------
>
>                 Key: ARROW-11901
>                 URL: https://issues.apache.org/jira/browse/ARROW-11901
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Benjamin Wilhelm
>            Priority: Major
>
> In response to the discussion in 
> https://github.com/apache/arrow/pull/8949/files#r588046787
> There are some performance penalties in the implementation of the compression 
> codecs (e.g. data copying between heap/off-heap data). We need to revise the 
> code to improve the performance. 
> We should also provide some benchmarks to validate that the performance 
> actually improves. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11901) [Java] Investigate potential performance improvement of compression codec

Reply via email to