andygrove opened a new pull request, #3832:
URL: https://github.com/apache/datafusion-comet/pull/3832

   ## Which issue does this PR close?
   
   Closes #2470.
   
   ## Rationale for this change
   
   When running with reduced off-heap memory, JNI `GlobalRef` objects inside 
`JavaException` errors can be dropped on tokio worker threads that are not 
attached to the JVM. This triggers warnings from the `jni` crate (`"Dropping a 
GlobalRef in a detached thread"`) and causes expensive temporary attach/detach 
cycles.
   
   The root cause: memory pool methods (`acquire_memory`/`release_memory`) call 
JNI via a temporary `AttachGuard`. If the JNI call throws a Java exception, a 
`CometError::JavaException` is created containing a `GlobalRef` to the 
throwable. When the method returns, the `AttachGuard` drops and detaches the 
thread — but the `GlobalRef` inside the error outlives it. As the error 
propagates through DataFusion on the now-detached tokio thread and is 
eventually converted to `DataFusionError::Execution(String)`, the `GlobalRef` 
is dropped on the detached thread, triggering the warning.
   
   ## What changes are included in this PR?
   
   - Add `CometError::drop_throwable()` which converts `JavaException` errors 
(containing a `GlobalRef`) into string-only `Internal` errors, ensuring the 
`GlobalRef` is dropped while the thread is still JVM-attached.
   - Apply `.map_err(CometError::drop_throwable)` in all four memory pool JNI 
methods: `acquire_from_spark`, `release_to_spark` (unified pool), `acquire`, 
`release` (fair pool).
   
   The `GlobalRef` is safe to drop early here because these errors always 
propagate through `DataFusionError::Execution(String)` which stringifies the 
error anyway — the throwable reference is never used to re-throw the original 
Java exception from this path.
   
   ## How are these changes tested?
   
   This is difficult to test in a unit test since it requires a full Spark 
executor environment with memory pressure on tokio worker threads. The fix is 
verified by code inspection: `map_err` executes while the `AttachGuard` (`env`) 
is still in scope, so the `GlobalRef` is released on an attached thread. Clippy 
passes cleanly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to