alexeykudinkin commented on PR #982: URL: https://github.com/apache/parquet-mr/pull/982#issuecomment-1377589036
Totally @shangxinli We have running Spark clusters in production _ingesting_ from 100s of Apache Hudi tables (using Parquet and Zstd) and writing into other ones. We switched from gzip to zstd slightly over a month ago and we started to have OOM issues almost immediately. It took us a bit of triaging to zero in on zstd, but now we're confident that it's not mis-calibration of our configs but slow-bleeding leak of the native memory. The crux of the problem is very particular type of the job -- one that reads a lot of Zstd compressed Parquet (therefore triggering the affected path). Other jobs not reading Parquet are not affected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org