[ https://issues.apache.org/jira/browse/ARROW-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17659276#comment-17659276 ]
Rok Mihevc commented on ARROW-2249: ----------------------------------- This issue has been migrated to [issue #18209|https://github.com/apache/arrow/issues/18209] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Java/Python] in-process vector sharing from Java to Python > ----------------------------------------------------------- > > Key: ARROW-2249 > URL: https://issues.apache.org/jira/browse/ARROW-2249 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python > Reporter: Uwe Korn > Priority: Major > Labels: beginner > > Currently we seem to use in all applications of Arrow the IPC capabilities to > move data between a Java process and a Python process. While this is > 0-serialization, it is not zero-copy. By taking the address and offset, we > can already create Python buffers from Java buffers: > https://github.com/apache/arrow/pull/1693. This is still a very low-level > interface and we should provide the user with: > * A guide on how to load Apache Arrow java libraries in Python (either > through a fat-jar that was shipped with Arrow or how he should integrate it > into its Java packaging) > * {{pyarrow.Array.from_jvm}}, {{pyarrow.RecordBatch.from_jvm}}, … functions > that take the respective Java objects and emit Python objects. These Python > objects should also ensure that the underlying memory regions are kept alive > as long as the Python objects exist. > This issue can also be used as a tracker for the various sub-tasks that will > need to be done to complete this rather large milestone. -- This message was sent by Atlassian Jira (v8.20.10#820010)