[ 
https://issues.apache.org/jira/browse/HDDS-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906118#comment-17906118
 ] 

Wei-Chiu Chuang commented on HDDS-11939:
----------------------------------------

Note: if it's netty, netty manages its own memory and does not depend on Java 
JVM. The direct memory it uses will be external to heap memory and will grow as 
much as the heap memory itself, unless you specify the direct memory size 
explicitly.

For netty memory leak issues, please supply option 
-Dio.netty.leakDetectionLevel=paranoid as well as 
-Dorg.apache.ratis.thirdparty.io.netty.leakDetectionLevel=paranoid 
If it's too slow, use advanced instead.

The two options will detect memory leak in Netty and report them (don't recall 
if it dumps to stdout or stderr)

> Ratis Memory leak of DataStreamMapImpl when Stream write
> --------------------------------------------------------
>
>                 Key: HDDS-11939
>                 URL: https://issues.apache.org/jira/browse/HDDS-11939
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: ChenXi
>            Priority: Critical
>         Attachments: image-2024-12-15-21-14-27-612.png, 
> image-2024-12-15-22-12-41-704.png, image-2024-12-15-22-14-22-904.png, 
> image-2024-12-15-22-18-24-185.png, image-2024-12-15-22-21-26-187.png, 
> image-2024-12-15-22-35-16-703.png
>
>
> h2. Phenomena
> We have found a memory leak in DataStreamMapImpl when writing with Stream 
> online.
> This problem can be reproduced on the current master branch
> DataStreamMapImpl even after the write request stops for a long time, will 
> still hold a very large number of DataStream, these DataStream unless 
> restarted, or more and more, can not be released
> !image-2024-12-15-22-21-26-187.png|width=993,height=758!
> !image-2024-12-15-21-14-27-612.png|width=1209,height=474!
> h2. Reproduction method
> h3. Starting the cluster in Steaming mode
> refer to: 
> [https://ozone.apache.org/docs/edge/feature/streaming-write-pipeline.html]
> h3. execute a command
>  
> {code:bash}
> for i in `seq 1 10`; do ozone freon ommg --operation CREATE_STREAM_FILE -n 
> 100 -t 100 --size=1M --volume s3v  --bucket bucket1 --duration 5;done
> {code}
> *Another model of reproduction,* 
> use timeout to force end the write, this will generate more leak even just 
> use one thread to write
> {code:bash}
> for i in `seq 1 10`; timeout 10 do ozone freon ommg --operation 
> CREATE_STREAM_FILE -n 100 -t 1 --size=1M --volume s3v  --bucket bucket1 
> --duration 100;done
> {code}
>  
>  
> Note: -t 100 number of client threads 100, this is the key to reproduction, 
> must be multi-threaded client to reproduce the leak
> h3. h3. DataStreamMapImpl 
> At this point in the DataStreamMapImpl has been left in some DataStream but 
> because there is no log, so can not be directly observed,
> I added a log in the DataStreamMapImpl#remove to facilitate the observation, 
> this is a screenshot of my logs
> !image-2024-12-15-22-35-16-703.png|width=851,height=382!
> !image-2024-12-15-22-18-24-185.png|width=1816,height=920!
> h4.  
> h3. h3. Netty's mem
> And Netty's direct mem keeps growing, not dropping.
> Before
> !image-2024-12-15-22-12-41-704.png|width=675,height=120!
> After
> !image-2024-12-15-22-14-22-904.png|width=735,height=108!
>  
>  
> h3. h3. Other
> The `cleanUpOnChannelInactive` method does not clean up a leaked `DataStream` 
> in `DataStreamMapImpl.`
> If you use a single-threaded client test, the leak does not occur
>  
> {code:java}
> for i in `seq 1 10`; do ozone freon ommg --operation CREATE_STREAM_FILE -n 
> 100 -t 1 --size=1M --volume s3v --bucket bucket1 --duration 5;done {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to