parthchandra commented on code in PR #2163: URL: https://github.com/apache/datafusion-comet/pull/2163#discussion_r2283402464
########## native/core/src/execution/jni_api.rs: ########## @@ -361,11 +361,11 @@ fn prepare_output( new_array .to_data() - .move_to_spark(array_addrs[i], schema_addrs[i])?; + .move_to_ffi(array_addrs[i], schema_addrs[i])?; Review Comment: `ffi` is really just the mechanism, isn't it? It doesn't itself have ownership at any point in time. This method does in fact move the ownership to Spark. Or am I mistaken? ########## docs/source/contributor-guide/plugin_overview.md: ########## @@ -140,8 +140,31 @@ accessing Arrow data structures from multiple languages. [Arrow C Data Interface]: https://arrow.apache.org/docs/format/CDataInterface.html -- `CometExecIterator` invokes native plans and uses Arrow FFI to read the output batches -- Native `ScanExec` operators call `CometBatchIterator` via JNI to fetch input batches from the JVM +### Array Ownership and Lifecycle + +#### Importing Batches from Native to JVM + +`CometExecIterator` invokes native plans by calling the JNI function `executePlan`. The ownership of the output +batches, which are created in native code, is transferred to FFI ready to be consumed by Java once the `executePlan` +function returns. + +Once the JVM code finishes processing the arrays, it will call the `release` callback, which invokes native code +to release the memory that was allocated on the native side. + +#### Exporting Batches from JVM to Native + +The leaf nodes of native plans are often `ScanExec` operators, which call `CometBatchIterator` via JNI to fetch +input batches from the JVM. + +Note that this approach does not follow best practice and does not benefit from zero-copy transfer. + +The incoming array data is owned by the JVM and can be freed or reused on the JVM side during the next call to Review Comment: Is this correct? `native_comet` uses JVM to read data from the data file(s). This data is passed to the native side as a java byte array when the native code calls `org.apache.comet.parquet.Native.setPageV1`. From what I understand the native decoder decodes this byte array and constructs arrow vectors. This is where the memory ownership gets fuzzy because, from what I now understand, the vectors created by the decoder are backed by mutable buffers _which can be reused_. At this point the memory ownership model becomes unsafe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org