[
https://issues.apache.org/jira/browse/HDDS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duong updated HDDS-7543:
------------------------
Description:
I recently did a simulation test to stress SCM handling of 5K of nodes and a
hundred million containers. The test shows that for each container, SCM seems
to spend around 1.5kb of memory for the metadata cache, hence 150GB for 100M
containers.
TODO: put graphs
Also GC count and cost also linearly grow with containers.
TODO: put graphs.
This ticket tracks all the efforts to micro-optimize SCM memory usage, both
long-term (cache) and short-time (like temporary variables, protobuf serialized
objects).
Micro-optimizations can sound a bit tedious, but below are some numbers to get
into consideration, given the fact that SCM is not horizontally scalable.
* In the context of 100M containers, every 10 bytes saved in container-related
cache is 1GB of RAM saved.
* 5K datanodes results in 10K heartbeats per minute, hence ~167 per second
consistently. It's not a lot, but in an actively updated cluster, the heartbeat
message size and the indicated workload introduce significant work for GC.
was:
I recently did a simulation test to stress SCM handling of 5K of nodes and a
hundred million containers. The test shows that for each container, SCM seems
to spend around 1.5kb of memory for the metadata cache, hence 150GB for 100M
containers.
TODO: put graphs
Also GC count and cost also linearly grow with containers.
TODO: put graphs.
This ticket tracks all the efforts to micro-optimize SCM memory usage, both
long-term (cache) and short-time (like temporary variables, protobuf serialized
objects).
> SCM memory optimization
> -----------------------
>
> Key: HDDS-7543
> URL: https://issues.apache.org/jira/browse/HDDS-7543
> Project: Apache Ozone
> Issue Type: Improvement
> Affects Versions: 1.3.0
> Reporter: Duong
> Priority: Major
>
> I recently did a simulation test to stress SCM handling of 5K of nodes and a
> hundred million containers. The test shows that for each container, SCM seems
> to spend around 1.5kb of memory for the metadata cache, hence 150GB for 100M
> containers.
> TODO: put graphs
> Also GC count and cost also linearly grow with containers.
> TODO: put graphs.
> This ticket tracks all the efforts to micro-optimize SCM memory usage, both
> long-term (cache) and short-time (like temporary variables, protobuf
> serialized objects).
>
> Micro-optimizations can sound a bit tedious, but below are some numbers to
> get into consideration, given the fact that SCM is not horizontally scalable.
> * In the context of 100M containers, every 10 bytes saved in
> container-related cache is 1GB of RAM saved.
> * 5K datanodes results in 10K heartbeats per minute, hence ~167 per second
> consistently. It's not a lot, but in an actively updated cluster, the
> heartbeat message size and the indicated workload introduce significant work
> for GC.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]