Kontinuation commented on issue #884: URL: https://github.com/apache/datafusion-comet/issues/884#issuecomment-2316896912
I've investigated the problem and found that the leak is caused by these 2 allocations: https://github.com/apache/datafusion-comet/blob/0.2.0/common/src/main/scala/org/apache/comet/vector/NativeUtil.scala#L65-L66 ```scala val arrowSchema = ArrowSchema.allocateNew(allocator) val arrowArray = ArrowArray.allocateNew(allocator) ``` This is for constructing the Arrow C data structures for transferring Arrow batch vectors from Scala (JVM) to the native executor (Rust). The native executor will move the transferred vectors and take ownership of them, but the `arrowSchema` and `arrowArray` base structures allocated in JVM never get released. Each time we transfer a batch from JVM to the native executor, we leak 2 base structures worth of memory. I applied a fix on my fork and the problem went away: https://github.com/Kontinuation/datafusion-comet/commit/a90f43a983d234acc7c9a1cf69336b865c5f93ac  The native memory allocated by `Unsafe_AllocateMemory0` becomes pretty small and constant: ``` [0x00000001071c98a8] Unsafe_AllocateMemory0(JNIEnv_*, _jobject*, long)+0xcc [0x00000001170c05b4] (malloc=15386KB type=Other +1395KB #44 -123) -- [0x00000001071c98a8] Unsafe_AllocateMemory0(JNIEnv_*, _jobject*, long)+0xcc [0x0000000116817be0] (malloc=8418KB type=Other -16KB #36 -5) ``` Running TPC-H benchmarks with AQE coalesce partitions enabled still has memory leak problem, I'm still investigating it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org