vibhatha commented on code in PR #327:
URL: https://github.com/apache/arrow-cookbook/pull/327#discussion_r1332860457


##########
java/source/python_java.rst:
##########
@@ -0,0 +1,261 @@
+.. _arrow-python-java:
+
+========================
+PyArrow Java Integration
+========================
+
+The PyArrow library offers a powerful API for Python that can be integrated 
with Java applications.
+This document provides a guide on how to enable seamless data exchange between 
Python and Java components using PyArrow.
+
+.. contents::
+
+Dictionary Data Roundtrip
+=========================
+
+    This section demonstrates a data roundtrip, where a dictionary array is 
created in Python, accessed and updated in Java,
+    and finally re-accessed and validated in Python for data consistency.
+
+
+Python Component:
+-----------------
+
+    The Python code uses jpype to start the JVM and make the Java class 
MapValuesConsumer available to Python.
+    Data is generated in PyArrow and exported through C Data to Java.
+
+.. code-block:: python
+
+    import jpype
+    import jpype.imports
+    from jpype.types import *
+    import pyarrow as pa
+    from pyarrow.cffi import ffi as arrow_c
+
+    # Init the JVM and make MapValuesConsumer class available to Python.
+    jpype.startJVM(classpath=[ "../target/*"])
+    java_c_package = jpype.JPackage("org").apache.arrow.c
+    MapValuesConsumer = JClass('MapValuesConsumer')
+    CDataDictionaryProvider = 
JClass('org.apache.arrow.c.CDataDictionaryProvider')
+
+    # Starting from Python and generating data
+
+    # Create a Python DictionaryArray
+
+    dictionary = pa.dictionary(pa.int64(), pa.utf8())
+    array = pa.array(["A", "B", "C", "A", "D"], dictionary)
+    print("From Python")
+    print("Dictionary Created: ", array)
+
+    # create the CDataDictionaryProvider instance which is
+    # required to create dictionary array precisely
+    c_provider = CDataDictionaryProvider()
+
+    consumer = MapValuesConsumer(c_provider)
+
+    # Export the Python array through C Data
+    c_array = arrow_c.new("struct ArrowArray*")
+    c_array_ptr = int(arrow_c.cast("uintptr_t", c_array))
+    array._export_to_c(c_array_ptr)
+
+    # Export the Schema of the Array through C Data
+    c_schema = arrow_c.new("struct ArrowSchema*")
+    c_schema_ptr = int(arrow_c.cast("uintptr_t", c_schema))
+    array.type._export_to_c(c_schema_ptr)
+
+    # Send Array and its Schema to the Java function
+    # that will update the dictionary
+    consumer.update(c_array_ptr, c_schema_ptr)
+
+    # Importing updated values from Java to Python
+
+    # Export the Python array through C Data
+    updated_c_array = arrow_c.new("struct ArrowArray*")
+    updated_c_array_ptr = int(arrow_c.cast("uintptr_t", updated_c_array))
+
+    # Export the Schema of the Array through C Data
+    updated_c_schema = arrow_c.new("struct ArrowSchema*")
+    updated_c_schema_ptr = int(arrow_c.cast("uintptr_t", updated_c_schema))
+
+    java_wrapped_array = java_c_package.ArrowArray.wrap(updated_c_array_ptr)
+    java_wrapped_schema = java_c_package.ArrowSchema.wrap(updated_c_schema_ptr)
+
+    java_c_package.Data.exportVector(
+        consumer.getAllocatorForJavaConsumer(),
+        consumer.getVector(),
+        c_provider,
+        java_wrapped_array,
+        java_wrapped_schema
+    )
+
+    print("From Java back to Python")
+    updated_array = pa.Array._import_from_c(updated_c_array_ptr, 
updated_c_schema_ptr)
+
+    # In Java and Python, the same memory is being accessed through the C Data 
interface.
+    # Since the array from Java and array created in Python should have same 
data. 
+    assert updated_array.equals(array)
+    print("Updated Array: ", updated_array)
+
+    del updated_array

Review Comment:
   Got it, I have one question since this API is pretty new to me. 
   
   So what happens here is we call Java from Python. So Python VM is up first, 
then from Python VM we up another JVM. Then we access the memory from Java and 
from that we create a Python object. So the Python object and Java object 
points to the same memory. Is this statement correct?
   
   Then what could happen is, the Python shutsdown its VM and in the process it 
would try to shutdown JVM first. The `exportVector` function call to Java would 
call a function called `release_exported`. This is where we see that warning. 
   
   Further according to a comment in the `release_exported` in `jni_wrapper.cc`
   
   ```c++
   // It is possible for the JVM to be shut down when this is called;
   // guard against that.  Example: Python code using JPype may shut
   // down the JVM before releasing the stream.
   ```
   I believe this above warning could cause when attempting to delete global 
references? 
   Please correct me if I am wrong. And if there is a better and accurate 
explanation, would appreciate to learn a few things about it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to