parthchandra commented on code in PR #2163:
URL: https://github.com/apache/datafusion-comet/pull/2163#discussion_r2283402464
##########
native/core/src/execution/jni_api.rs:
##########
@@ -361,11 +361,11 @@ fn prepare_output(
new_array
.to_data()
- .move_to_spark(array_addrs[i], schema_addrs[i])?;
+ .move_to_ffi(array_addrs[i], schema_addrs[i])?;
Review Comment:
`ffi` is really just the mechanism, isn't it? It doesn't itself have
ownership at any point in time. This method does in fact move the ownership to
Spark. Or am I mistaken?
##########
docs/source/contributor-guide/plugin_overview.md:
##########
@@ -140,8 +140,31 @@ accessing Arrow data structures from multiple languages.
[Arrow C Data Interface]:
https://arrow.apache.org/docs/format/CDataInterface.html
-- `CometExecIterator` invokes native plans and uses Arrow FFI to read the
output batches
-- Native `ScanExec` operators call `CometBatchIterator` via JNI to fetch input
batches from the JVM
+### Array Ownership and Lifecycle
+
+#### Importing Batches from Native to JVM
+
+`CometExecIterator` invokes native plans by calling the JNI function
`executePlan`. The ownership of the output
+batches, which are created in native code, is transferred to FFI ready to be
consumed by Java once the `executePlan`
+function returns.
+
+Once the JVM code finishes processing the arrays, it will call the `release`
callback, which invokes native code
+to release the memory that was allocated on the native side.
+
+#### Exporting Batches from JVM to Native
+
+The leaf nodes of native plans are often `ScanExec` operators, which call
`CometBatchIterator` via JNI to fetch
+input batches from the JVM.
+
+Note that this approach does not follow best practice and does not benefit
from zero-copy transfer.
+
+The incoming array data is owned by the JVM and can be freed or reused on the
JVM side during the next call to
Review Comment:
Is this correct? `native_comet` uses JVM to read data from the data file(s).
This data is passed to the native side as a java byte array when the native
code calls `org.apache.comet.parquet.Native.setPageV1`. From what I understand
the native decoder decodes this byte array and constructs arrow vectors. This
is where the memory ownership gets fuzzy because, from what I now understand,
the vectors created by the decoder are backed by mutable buffers _which can be
reused_. At this point the memory ownership model becomes unsafe.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]