nicochen opened a new issue, #2789:
URL: https://github.com/apache/amoro/issues/2789
### What happened?
I use several flink sql tasks to ingest data into mix-hive formated table.
Task managers of flink had been periodically killed as it exceeds yarn
container memory restriction, while its memory consumption of heap are
significantly less than startup requested.
I used a gprof tool to trace and statistics how a tm process requests memoy
from OS which is like:
Total: 2297.2 MB
1516.5 66.0% 66.0% 1516.5 66.0% deflateInit2_
559.2 24.3% 90.4% 559.3 24.3% os::malloc@905260
192.9 8.4% 98.8% 192.9 8.4% os::malloc@905400
11.7 0.5% 99.3% 11.7 0.5% updatewindow
8.3 0.4% 99.6% 8.3 0.4% readCEN
4.7 0.2% 99.8% 4.7 0.2% init
2.5 0.1% 99.9% 2.5 0.1% inflateInit2_
0.6 0.0% 100.0% 1517.1 66.0%
Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init
I am suspicious of
Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_init.
After using arthas tool to trace zlibCompressor stacks, the problem is the
code never called compressor.close to release memory blocks and always renew a
new block from OS when writing a file.
I fixed this bug locally and ran it on my environment more than 2 months.
### Affects Versions
master
### What engines are you seeing the problem on?
Flink, Spark
### How to reproduce
Use a large dataset like what I used '20000000 records per day'. It unusual
to be reproduced with small datasets as I tested.
### Relevant log output
_No response_
### Anything else
_No response_
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's Code of Conduct
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]