[jira] [Commented] (HDDS-13608) Support Ozone container merge

Ivan Andika (Jira) Wed, 04 Feb 2026 22:42:24 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18056577#comment-18056577
 ]


Ivan Andika commented on HDDS-13608:
------------------------------------

For example, one of our cluster have around 15,000,000 (15 million) containers 
and 14.5PB total size. However, ideally all containers will have size "
"hdds.container.close.threshold" * "ozone.scm.container.size" (0.9 * 5GB). So 
based on the calculation 14.5PB / (0.9 * 5GB), there should only be 3,378,745 
(3.3 million), which is around 22% of the current clusters. This means that we 
need to store and handle 5x more containers that necessary.

> Support Ozone container merge
> -----------------------------
>
>                 Key: HDDS-13608
>                 URL: https://issues.apache.org/jira/browse/HDDS-13608
>             Project: Apache Ozone
>          Issue Type: Wish
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> In an ideal cluster, each container will be closed when it's full (e.g. 
> nearing the 5GB size). However, in real clusters a lot of times these 
> containers are prematurely closed due to one reason or another which causes a 
> lot of small containers. Small and big containers are considered equally 
> during container replications which cause things like decommission to take a 
> longer time since it needs to replicate a lot of these small containers (so 
> things like replication limit of N might sometimes cause spike in replication 
> throughput due to differing container sizes). Large number of containers can 
> also put a high memory pressure on SCM and Datanodes (e.g. causes large 
> dentry which might cause high native memory pressure if a lot of container 
> directories are accessed at one time, this is one of the reasons why we 
> internally change to lightweight DedicatedDiskSpaceUsage instead of heavy DU).
> This is a wish to kickstart discussion on container merging. If there are 
> small CLOSED containers, we can schedule some merge operations to combine 
> them to a single container. However, there are a lot of foreseen complexities 
> since we might need to create a new container ID (or pick one) which will be 
> different in what is stored in the key location info in the OM One way is to 
> create a layer of mapping between the old container ID and the new (merged) 
> container ID when getting the block location, but this will add more 
> overheads in memory and lookup, and can cause more than 1 indirections if we 
> have more than one merging. One idea to resolve this is for the OM to be able 
> to persistently store the change in the container ID and if so SCM can safely 
> remove the old container ID from its cache. Another possible issue would be 
> to pick the containers to merge, two containers replicas might not share the 
> same set of datanodes. Therefore, we might do some replications before 
> merging which might cause further overheads.
> This asynchronous merging of containers is implemented in systems like 
> ClickHouse ("part" merge).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-13608) Support Ozone container merge

Reply via email to