[I] [Bug]: Memory leaks when using mix -hive format to write parquet files [amoro]

via GitHub Fri, 26 Apr 2024 02:22:25 -0700


nicochen opened a new issue, #2789:
URL: https://github.com/apache/amoro/issues/2789


   ### What happened?
   
   I use several flink sql tasks to ingest data into mix-hive formated table. 
Task managers of flink had been periodically killed as it exceeds yarn 
container memory restriction, while its memory consumption of heap are 
significantly less than startup requested. 
   I used a gprof tool to trace and statistics how a tm process requests memoy 
from OS which is like:
   Total: 2297.2 MB
     1516.5  66.0%  66.0%   1516.5  66.0% deflateInit2_
      559.2  24.3%  90.4%    559.3  24.3% os::malloc@905260
      192.9   8.4%  98.8%    192.9   8.4% os::malloc@905400
       11.7   0.5%  99.3%     11.7   0.5% updatewindow
        8.3   0.4%  99.6%      8.3   0.4% readCEN
        4.7   0.2%  99.8%      4.7   0.2% init
        2.5   0.1%  99.9%      2.5   0.1% inflateInit2_
        0.6   0.0% 100.0%   1517.1  66.0% 
Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init
   I am suspicious of 
Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init. 
   
   After using arthas tool to trace zlibCompressor stacks, the problem is the 
code never called compressor.close to release memory blocks and always renew a 
new block from OS when writing a file.
   
   I fixed this bug locally and ran it on my environment more than 2 months. 
   
   
   ### Affects Versions
   
   master
   
   ### What engines are you seeing the problem on?
   
   Flink, Spark
   
   ### How to reproduce
   
   Use a large dataset like what I used '20000000 records per day'. It unusual 
to be reproduced with small datasets as I tested.
   
   ### Relevant log output
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's Code of Conduct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug]: Memory leaks when using mix -hive format to write parquet files [amoro]

Reply via email to