[ 
https://issues.apache.org/jira/browse/HDDS-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905897#comment-17905897
 ] 

Ivan Andika edited comment on HDDS-11939 at 12/16/24 6:14 AM:
--------------------------------------------------------------

[~jianghuazhu] [~XiChen] I think it might not related. The zero copy 
[~jianghuazhu] mentioned should be related to the gRPC zero copy 
(https://issues.apache.org/jira/browse/RATIS-1931) feature. I'm not sure we are 
using it right now since there are still some issues that need to be resolved.

The problem seems to be related to Ratis streaming 
https://issues.apache.org/jira/browse/RATIS-979 which also uses zero copy 
techniques.


was (Author: JIRAUSER298977):
[~jianghuazhu] [~XiChen] I think it might not related. The zero copy 
[~jianghuazhu]  should be related to the gRPC zero copy 
(https://issues.apache.org/jira/browse/RATIS-1931) feature. I'm not sure we are 
using it right now since there are still some issues that need to be resolved.

The problem seems to be related to Ratis streaming 
https://issues.apache.org/jira/browse/RATIS-979 which also uses zero copy 
techniques.

> Ratis Memory leak of DataStreamMapImpl when Stream write
> --------------------------------------------------------
>
>                 Key: HDDS-11939
>                 URL: https://issues.apache.org/jira/browse/HDDS-11939
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: ChenXi
>            Priority: Critical
>         Attachments: image-2024-12-15-21-14-27-612.png, 
> image-2024-12-15-22-12-41-704.png, image-2024-12-15-22-14-22-904.png, 
> image-2024-12-15-22-18-24-185.png, image-2024-12-15-22-21-26-187.png, 
> image-2024-12-15-22-35-16-703.png
>
>
> h2. Phenomena
> We have found a memory leak in DataStreamMapImpl when writing with Stream 
> online.
> This problem can be reproduced on the current master branch
> DataStreamMapImpl even after the write request stops for a long time, will 
> still hold a very large number of DataStream, these DataStream unless 
> restarted, or more and more, can not be released
> !image-2024-12-15-22-21-26-187.png|width=993,height=758!
> !image-2024-12-15-21-14-27-612.png|width=1209,height=474!
> h2. Reproduction method
> h3. Starting the cluster in Steaming mode
> refer to: 
> [https://ozone.apache.org/docs/edge/feature/streaming-write-pipeline.html]
> h3. execute a command
>  
> {code:bash}
> for i in `seq 1 10`; do ozone freon ommg --operation CREATE_STREAM_FILE -n 
> 100 -t 100 --size=1M --volume s3v  --bucket bucket1 --duration 5;done
> {code}
> Note: -t 100 number of client threads 100, this is the key to reproduction, 
> must be multi-threaded client to reproduce the leak
> h3. h3. DataStreamMapImpl 
> At this point in the DataStreamMapImpl has been left in some DataStream but 
> because there is no log, so can not be directly observed,
> I added a log in the DataStreamMapImpl#remove to facilitate the observation, 
> this is a screenshot of my logs
> !image-2024-12-15-22-35-16-703.png|width=851,height=382!
> !image-2024-12-15-22-18-24-185.png|width=1816,height=920!
> h4.  
> h3. h3. Netty's mem
> And Netty's direct mem keeps growing, not dropping.
> Before
> !image-2024-12-15-22-12-41-704.png|width=675,height=120!
> After
> !image-2024-12-15-22-14-22-904.png|width=735,height=108!
>  
>  
> h3. h3. Other
> The `cleanUpOnChannelInactive` method does not clean up a leaked `DataStream` 
> in `DataStreamMapImpl.`
> If you use a single-threaded client test, the leak does not occur
>  
> {code:java}
> for i in `seq 1 10`; do ozone freon ommg --operation CREATE_STREAM_FILE -n 
> 100 -t 1 --size=1M --volume s3v --bucket bucket1 --duration 5;done {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to