[GitHub] [parquet-mr] alexeykudinkin commented on pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

GitBox Tue, 10 Jan 2023 09:13:45 -0800


alexeykudinkin commented on PR #982:
URL: https://github.com/apache/parquet-mr/pull/982#issuecomment-1377589036


   Totally @shangxinli  
   
   We have running Spark clusters in production _ingesting_ from 100s of Apache 
Hudi tables (using Parquet and Zstd) and writing into other ones. We switched 
from gzip to zstd slightly over a month ago and we started to have OOM issues 
almost immediately. It took us a bit of triaging to zero in on zstd, but now 
we're confident that it's not mis-calibration of our configs but slow-bleeding 
leak of the native memory.
   
   The crux of the problem is very particular type of the job -- one that reads 
a lot of Zstd compressed Parquet (therefore triggering the affected path). 
Other jobs not reading Parquet are not affected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [parquet-mr] alexeykudinkin commented on pull request #982: PARQUET-2160: Close ZstdInputStream to free off-heap memory in time.

Reply via email to