hey

month ago someone spotted memory leak while reading zstd files with
hudi
https://github.com/apache/parquet-mr/pull/982#issuecomment-1376498280

since then spark has merged fixes for 3.2.4, 3.3.3, 3.4.0
https://issues.apache.org/jira/browse/SPARK-41952

we are currently on spark 3.2.4, hudi 0.13.1 and having similar issue
(massive off-heap usage) while scanning very large hudi tables backed
with zstd

What is the state of this issue? is there any patch to apply on hudi
side as well or can I consider it fixed by using spark 3.2.4 ?

I attach a graph from the uber jvm-profiler to illustrate our current
troubles.

thanks by advance

Reply via email to