[
https://issues.apache.org/jira/browse/ARROW-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434659#comment-17434659
]
Samuel Audet commented on ARROW-11901:
--------------------------------------
[~emkornfield], since the C++ builds of Arrow already include LZ4, it is indeed
pretty trivial to expose a few JNI methods to access it. The larger picture
though is that the overall Java API of Arrow itself is still pretty limited and
inefficient, even after 5 years in development! And there _are_ users such as
[[email protected]] that require more performance, and that's why
there are also JavaCPP Presets for the C++ API of Arrow:
[https://github.com/bytedeco/javacpp-presets/tree/master/arrow]
Now, the C++ API doesn't always map very elegantly to Java, but it is tons
faster, and maps a lot more functionality. This would be a discussion for
another thread, but if the Java API of Arrow were to be based on JavaCPP, it
would allow users to fall back easily on that API, instead of forcing them to
start hacking stuff in JNI. Case in point, the {{arrow::util::Codec}} class has
been usable from Java for almost 2 years now:
[https://github.com/bytedeco/javacpp-presets/blob/master/arrow/src/gen/java/org/bytedeco/arrow/Codec.java]
I would be happy to maintain those presets as part of the Arrow project, just
like I'm currently doing in the case of TensorFlow for Java:
[https://github.com/tensorflow/java/search?q=javacpp]
Previous discussions with people from Apache Arrow didn't elicit much interest,
but in time the need for a tool like Cython in Java will become obvious to all,
and JavaCPP already provides that!
> [Java] Investigate potential performance improvement of compression codec
> -------------------------------------------------------------------------
>
> Key: ARROW-11901
> URL: https://issues.apache.org/jira/browse/ARROW-11901
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java
> Reporter: Liya Fan
> Assignee: Benjamin Wilhelm
> Priority: Major
>
> In response to the discussion in
> https://github.com/apache/arrow/pull/8949/files#r588046787
> There are some performance penalties in the implementation of the compression
> codecs (e.g. data copying between heap/off-heap data). We need to revise the
> code to improve the performance.
> We should also provide some benchmarks to validate that the performance
> actually improves.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)