[
https://issues.apache.org/jira/browse/ARROW-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444339#comment-17444339
]
Samuel Audet commented on ARROW-11901:
--------------------------------------
{quote}This would be a mailing dev@ mailing list discussion. I don't think we
would eliminate the existing API, but there might be some interest alternative
Java APIs.
{quote}
It's not about eliminating anything, it's about developing the existing Java
API, such as this very specific use case for compression codecs.
[[email protected]] was able to wrap LZ4 using JavaCPP, all by
himself! it's a lot easier to do than code everything manually with JNI:
[https://github.com/bytedeco/javacpp-presets/pull/1094]
The Python API of Arrow isn't just automatically generated wrappers around the
C++ API using Cython, right? It's the same for Java. We can use tools like
Cython to make the life of Python developers easier, so why not do the same for
Java developers?
We were able to cut the wrapping code in half by rebasing the Java API of
TensorFlow on JavaCPP, and performance increased to boot:
[https://github.com/tensorflow/java/pull/18#issuecomment-579600568]
We could do the same for Arrow!
{quote}[[email protected]] Do you have pointers? I looked maybe too
quickly and didn't see it used in other Apache projects for instance. If you
have something that works for your use-case that is great, and if you want to
open-source it also great, but it might need to live in a KNIME hosted project
for the time being. I believe Arrow is now building JNI bindings for all major
platforms, so the release story is a little bit better for a JNI code hosted by
Arrow, I'll see how hard it would be to make the bindings at this point.
{quote}
When it comes to Apache projects, I tried to donate the JavaCPP Presets for
MXNet, but they don't seem interested anymore:
[https://github.com/apache/incubator-mxnet/pull/19797]
I'm also publishing builds for Apache TVM as well, but again, not getting much
traction:
[http://bytedeco.org/news/2020/12/12/deploy-models-with-javacpp-and-tvm/]
If you have some ideas as to why most engineers are OK using Cython in the case
of Python, but not the equivalent in the case of Java, I would be very much
interested in hearing your opinions.
> [Java] Investigate potential performance improvement of compression codec
> -------------------------------------------------------------------------
>
> Key: ARROW-11901
> URL: https://issues.apache.org/jira/browse/ARROW-11901
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java
> Reporter: Liya Fan
> Assignee: Benjamin Wilhelm
> Priority: Major
>
> In response to the discussion in
> https://github.com/apache/arrow/pull/8949/files#r588046787
> There are some performance penalties in the implementation of the compression
> codecs (e.g. data copying between heap/off-heap data). We need to revise the
> code to improve the performance.
> We should also provide some benchmarks to validate that the performance
> actually improves.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)