[
https://issues.apache.org/jira/browse/HADOOP-12007?focusedWorklogId=792273&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792273
]
ASF GitHub Bot logged work on HADOOP-12007:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 18/Jul/22 17:37
Start Date: 18/Jul/22 17:37
Worklog Time Spent: 10m
Work Description: kevins-29 opened a new pull request, #4585:
URL: https://github.com/apache/hadoop/pull/4585
### Description of PR
Explicitly call `end()` when returning `Compressor` or `Decompressor`
implementations with `DoNotPool` annotation to the `CodecPool`.
### How was this patch tested?
I created the following
[project](https://github.com/kevins-29/hadoop-gzip-memory-leak) to demo the
leak. You can run the demo with
``` shell
./gradlew run
```
and then monitor the memory usage using
```shell
while true; do echo \"$(date +%Y-%m-%d' '%H:%M:%S)\",$(pmap -x <PID> | grep
"total kB" | awk '{print $4}'); sleep 10; done;
```
### Results - Before Patch
```
"2022-07-18 03:21:49",1113060
"2022-07-18 03:22:00",1126184
"2022-07-18 03:22:10",1126248
"2022-07-18 03:22:20",1126248
"2022-07-18 03:22:30",1130204
"2022-07-18 03:22:40",1130216
"2022-07-18 03:22:50",1130244
"2022-07-18 03:23:00",1130776
"2022-07-18 03:23:10",1130776
"2022-07-18 03:23:20",1130776
"2022-07-18 03:23:30",1130776
"2022-07-18 03:23:40",1130888
"2022-07-18 03:23:50",1130888
"2022-07-18 03:24:00",1130888
"2022-07-18 03:24:10",1130928
"2022-07-18 03:24:20",1130928
"2022-07-18 03:24:30",1130928
"2022-07-18 03:24:40",1131204
"2022-07-18 03:24:50",1131204
"2022-07-18 03:25:00",1131204
"2022-07-18 03:25:10",1131204
"2022-07-18 03:25:20",1139044
"2022-07-18 03:25:30",1140900
"2022-07-18 03:25:40",1140900
"2022-07-18 03:25:50",1140900
"2022-07-18 03:26:00",1140900
"2022-07-18 03:26:10",1141164
"2022-07-18 03:26:20",1141164
"2022-07-18 03:26:30",1141164
"2022-07-18 03:26:40",1141164
"2022-07-18 03:26:50",1141164
"2022-07-18 03:27:00",1141164
"2022-07-18 03:27:10",1141164
```
### Results - After Patch
```
"2022-07-18 03:34:36",1098112
"2022-07-18 03:34:46",1098112
"2022-07-18 03:34:56",1098204
"2022-07-18 03:35:06",1098152
"2022-07-18 03:35:16",1098152
"2022-07-18 03:35:26",1098172
"2022-07-18 03:35:36",1098172
"2022-07-18 03:35:46",1098172
"2022-07-18 03:35:57",1098172
"2022-07-18 03:36:07",1098268
"2022-07-18 03:36:17",1098268
"2022-07-18 03:36:27",1098268
"2022-07-18 03:36:37",1098292
"2022-07-18 03:36:47",1098292
"2022-07-18 03:36:57",1098292
"2022-07-18 03:37:07",1098320
"2022-07-18 03:37:17",1098320
"2022-07-18 03:37:27",1098320
"2022-07-18 03:37:37",1098320
"2022-07-18 03:37:47",1098320
"2022-07-18 03:37:57",1098340
"2022-07-18 03:38:07",1098340
"2022-07-18 03:38:17",1098340
```
Issue Time Tracking
-------------------
Worklog Id: (was: 792273)
Remaining Estimate: 0h
Time Spent: 10m
> GzipCodec native CodecPool leaks memory
> ---------------------------------------
>
> Key: HADOOP-12007
> URL: https://issues.apache.org/jira/browse/HADOOP-12007
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Yejun Yang
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> org/apache/hadoop/io/compress/GzipCodec.java call
> CompressionCodec.Util.createOutputStreamWithCodecPool to use CodecPool. But
> compressor objects are actually never returned to pool which cause memory
> leak.
> HADOOP-10591 uses CompressionOutputStream.close() to return Compressor object
> to pool. But CompressionCodec.Util.createOutputStreamWithCodecPool actually
> returns a CompressorStream which overrides close().
> This cause CodecPool.returnCompressor never being called. In my log file I
> can see lots of "Got brand-new compressor [.gz]" but no "Got recycled
> compressor".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]