Current state of parquet zstd OOM with hudi

nicolas paris Mon, 20 Nov 2023 04:07:35 -0800

hey

month ago someone spotted memory leak while reading zstd files with
hudi
https://github.com/apache/parquet-mr/pull/982#issuecomment-1376498280


since then spark has merged fixes for 3.2.4, 3.3.3, 3.4.0
https://issues.apache.org/jira/browse/SPARK-41952

we are currently on spark 3.2.4, hudi 0.13.1 and having similar issue
(massive off-heap usage) while scanning very large hudi tables backed
with zstd

What is the state of this issue? is there any patch to apply on hudi
side as well or can I consider it fixed by using spark 3.2.4 ?

I attach a graph from the uber jvm-profiler to illustrate our current
troubles.

thanks by advance

Current state of parquet zstd OOM with hudi

Reply via email to