[ 
https://issues.apache.org/jira/browse/HADOOP-12007?focusedWorklogId=792273&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792273
 ]

ASF GitHub Bot logged work on HADOOP-12007:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jul/22 17:37
            Start Date: 18/Jul/22 17:37
    Worklog Time Spent: 10m 
      Work Description: kevins-29 opened a new pull request, #4585:
URL: https://github.com/apache/hadoop/pull/4585

   ### Description of PR
   Explicitly call `end()` when returning `Compressor` or `Decompressor` 
implementations with `DoNotPool` annotation to the `CodecPool`.
   
   ### How was this patch tested?
   I created the following 
[project](https://github.com/kevins-29/hadoop-gzip-memory-leak) to demo the 
leak. You can run the demo with 
   
   ``` shell
   ./gradlew run
   ``` 
   
   and then monitor the memory usage using
   
   ```shell
   while true; do echo \"$(date +%Y-%m-%d' '%H:%M:%S)\",$(pmap -x <PID> | grep 
"total kB" | awk '{print $4}'); sleep 10; done;
   ```
   
   ### Results - Before Patch
   
   ```
   "2022-07-18 03:21:49",1113060
   "2022-07-18 03:22:00",1126184
   "2022-07-18 03:22:10",1126248
   "2022-07-18 03:22:20",1126248
   "2022-07-18 03:22:30",1130204
   "2022-07-18 03:22:40",1130216
   "2022-07-18 03:22:50",1130244
   "2022-07-18 03:23:00",1130776
   "2022-07-18 03:23:10",1130776
   "2022-07-18 03:23:20",1130776
   "2022-07-18 03:23:30",1130776
   "2022-07-18 03:23:40",1130888
   "2022-07-18 03:23:50",1130888
   "2022-07-18 03:24:00",1130888
   "2022-07-18 03:24:10",1130928
   "2022-07-18 03:24:20",1130928
   "2022-07-18 03:24:30",1130928
   "2022-07-18 03:24:40",1131204
   "2022-07-18 03:24:50",1131204
   "2022-07-18 03:25:00",1131204
   "2022-07-18 03:25:10",1131204
   "2022-07-18 03:25:20",1139044
   "2022-07-18 03:25:30",1140900
   "2022-07-18 03:25:40",1140900
   "2022-07-18 03:25:50",1140900
   "2022-07-18 03:26:00",1140900
   "2022-07-18 03:26:10",1141164
   "2022-07-18 03:26:20",1141164
   "2022-07-18 03:26:30",1141164
   "2022-07-18 03:26:40",1141164
   "2022-07-18 03:26:50",1141164
   "2022-07-18 03:27:00",1141164
   "2022-07-18 03:27:10",1141164
   ```
   
   ### Results - After Patch
   ```
   "2022-07-18 03:34:36",1098112
   "2022-07-18 03:34:46",1098112
   "2022-07-18 03:34:56",1098204
   "2022-07-18 03:35:06",1098152
   "2022-07-18 03:35:16",1098152
   "2022-07-18 03:35:26",1098172
   "2022-07-18 03:35:36",1098172
   "2022-07-18 03:35:46",1098172
   "2022-07-18 03:35:57",1098172
   "2022-07-18 03:36:07",1098268
   "2022-07-18 03:36:17",1098268
   "2022-07-18 03:36:27",1098268
   "2022-07-18 03:36:37",1098292
   "2022-07-18 03:36:47",1098292
   "2022-07-18 03:36:57",1098292
   "2022-07-18 03:37:07",1098320
   "2022-07-18 03:37:17",1098320
   "2022-07-18 03:37:27",1098320
   "2022-07-18 03:37:37",1098320
   "2022-07-18 03:37:47",1098320
   "2022-07-18 03:37:57",1098340
   "2022-07-18 03:38:07",1098340
   "2022-07-18 03:38:17",1098340
   ```
   
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 792273)
    Remaining Estimate: 0h
            Time Spent: 10m

> GzipCodec native CodecPool leaks memory
> ---------------------------------------
>
>                 Key: HADOOP-12007
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12007
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Yejun Yang
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> org/apache/hadoop/io/compress/GzipCodec.java call 
> CompressionCodec.Util.createOutputStreamWithCodecPool to use CodecPool. But 
> compressor objects are actually never returned to pool which cause memory 
> leak.
> HADOOP-10591 uses CompressionOutputStream.close() to return Compressor object 
> to pool. But CompressionCodec.Util.createOutputStreamWithCodecPool actually 
> returns a CompressorStream which overrides close().
> This cause CodecPool.returnCompressor never being called. In my log file I 
> can see lots of "Got brand-new compressor [.gz]" but no "Got recycled 
> compressor".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to