[ 
https://issues.apache.org/jira/browse/ARROW-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated ARROW-2249:
-------------------------------
    Description: 
Currently we seem to use in all applications of Arrow the IPC capabilities to 
move data between a Java process and a Python process. While this is 
0-serialization, it is not zero-copy. By taking the address and offset, we can 
already create Python buffers from Java buffers: 
https://github.com/apache/arrow/pull/1693. This is still a very low-level 
interface and we should provide the user with:

* A guide on how to load Apache Arrow java libraries in Python (either through 
a fat-jar that was shipped with Arrow or how he should integrate it into its 
Java packaging)
* {{pyarrow.Array.from_jvm}}, {{pyarrow.RecordBatch.from_jvm}}, … functions 
that take the respective Java objects and emit Python objects. These Python 
objects should also ensure that the underlying memory regions are kept alive as 
long as the Python objects exist.

This issue can also be used as a tracker for the various sub-tasks that will 
need to be done to complete this rather large milestone.

  was:
Currently we seem to use in all applications of Arrow the IPC capabilities to 
move data between a Java process and a Python process. While this is 
0-serialization, it is not zero-copy. I'm going to have a first shot at 
exposing Java Vectors in Python as {{pyarrow.Array}}.

This issue can also be used as a tracker for the various sub-tasks that will 
need to be done to complete this rather large milestone.


> [Java/Python] in-process vector sharing from Java to Python
> -----------------------------------------------------------
>
>                 Key: ARROW-2249
>                 URL: https://issues.apache.org/jira/browse/ARROW-2249
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Java - Vectors, Python
>            Reporter: Uwe L. Korn
>            Assignee: Uwe L. Korn
>            Priority: Major
>              Labels: beginner
>             Fix For: 0.10.0
>
>
> Currently we seem to use in all applications of Arrow the IPC capabilities to 
> move data between a Java process and a Python process. While this is 
> 0-serialization, it is not zero-copy. By taking the address and offset, we 
> can already create Python buffers from Java buffers: 
> https://github.com/apache/arrow/pull/1693. This is still a very low-level 
> interface and we should provide the user with:
> * A guide on how to load Apache Arrow java libraries in Python (either 
> through a fat-jar that was shipped with Arrow or how he should integrate it 
> into its Java packaging)
> * {{pyarrow.Array.from_jvm}}, {{pyarrow.RecordBatch.from_jvm}}, … functions 
> that take the respective Java objects and emit Python objects. These Python 
> objects should also ensure that the underlying memory regions are kept alive 
> as long as the Python objects exist.
> This issue can also be used as a tracker for the various sub-tasks that will 
> need to be done to complete this rather large milestone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to