Kontinuation commented on issue #884:
URL: 
https://github.com/apache/datafusion-comet/issues/884#issuecomment-2316896912

   I've investigated the problem and found that the leak is caused by these 2 
allocations: 
https://github.com/apache/datafusion-comet/blob/0.2.0/common/src/main/scala/org/apache/comet/vector/NativeUtil.scala#L65-L66
   
   ```scala
   val arrowSchema = ArrowSchema.allocateNew(allocator)
   val arrowArray = ArrowArray.allocateNew(allocator)
   ```
   
   This is for constructing the Arrow C data structures for transferring Arrow 
batch vectors from Scala (JVM) to the native executor (Rust). The native 
executor will move the transferred vectors and take ownership of them, but the 
`arrowSchema` and `arrowArray` base structures allocated in JVM never get 
released. Each time we transfer a batch from JVM to the native executor, we 
leak 2 base structures worth of memory.
   
   I applied a fix on my fork and the problem went away: 
https://github.com/Kontinuation/datafusion-comet/commit/a90f43a983d234acc7c9a1cf69336b865c5f93ac
   
   
![image](https://github.com/user-attachments/assets/92efe0f3-6c49-4abc-bc2b-6086187402c8)
   
   The native memory allocated by `Unsafe_AllocateMemory0` becomes pretty small 
and constant:
   
   ```
   [0x00000001071c98a8] Unsafe_AllocateMemory0(JNIEnv_*, _jobject*, long)+0xcc
   [0x00000001170c05b4]
                                (malloc=15386KB type=Other +1395KB #44 -123)
   --
   [0x00000001071c98a8] Unsafe_AllocateMemory0(JNIEnv_*, _jobject*, long)+0xcc
   [0x0000000116817be0]
                                (malloc=8418KB type=Other -16KB #36 -5)
   ```
   
   Running TPC-H benchmarks with AQE coalesce partitions enabled still has 
memory leak problem, I'm still investigating it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to