[ 
https://issues.apache.org/jira/browse/HDDS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760083#comment-17760083
 ] 

Janus Chow commented on HDDS-7543:
----------------------------------

Hello [~duongnguyen], can I ask about the progress of this ticket? Are we 
planning to include it in 1.4.0 release?

> SCM memory optimization
> -----------------------
>
>                 Key: HDDS-7543
>                 URL: https://issues.apache.org/jira/browse/HDDS-7543
>             Project: Apache Ozone
>          Issue Type: Improvement
>    Affects Versions: 1.3.0
>            Reporter: Duong
>            Priority: Critical
>         Attachments: Screen Shot 2022-12-07 at 5.28.14 PM.png
>
>
> I recently did a simulation test to stress SCM handling of 5K of nodes and 66 
> million containers (that makes up an exabyte of raw data usage, 66M * 5gb * 
> replcation_3 = 1 Exabyte). The test shows that for each container, SCM seems 
> to spend around 1.5kb of memory for the metadata cache (or 150GB per 100M 
> containers).
> !Screen Shot 2022-12-07 at 5.28.14 PM.png|width=909,height=1032!
> Also, GC count and cost linearly grow with the number of containers.
> This ticket tracks all the efforts to micro-optimize SCM memory usage, both 
> long-term (cache) and short-time (like temporary variables, protobuf 
> serialized objects).
> Micro-optimizations can sound a bit tedious, but below are some numbers to 
> get into consideration, given the fact that SCM is not horizontally scalable.
>  * In the context of 100M containers, every 10 bytes saved in 
> container-related cache is 1GB of RAM saved.
>  * 5K datanodes results in 10K heartbeats per minute, hence ~167 per second 
> consistently. It's not a lot, but in an actively updated cluster, the 
> heartbeat message size and the indicated workload introduce significant work 
> for GC. 
>  
> A heapdump of SCM 330K containers (each has 3 replication) can be found 
> [here|https://drive.google.com/file/d/1Z-oOKKvb2yfVUzMqwpmu7CKZGe7IKYdK/view?usp=sharing].
> A same heapdump for Recon is 
> [here|https://drive.google.com/file/d/1CCboGRd1CIQxUVuXb3yZD1PqpxSjFV4O/view?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to