[ 
https://issues.apache.org/jira/browse/HDDS-13608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18041233#comment-18041233
 ] 

Ivan Andika commented on HDDS-13608:
------------------------------------

Note that this merging containers were asked in the initial Ozone design 
(HDFS-7240).
{quote}Merges and splits of containers. We need nice large 5GB containers to 
hit the SCM scalability targets. However, I think we're going to have a harder 
time with this than a system like HBase. HDFS sees a relatively high delete 
rate for recently written data, e.g. intermediate data in a processing 
pipeline. HDFS also sees a much higher variance in key/value size. Together, 
these factors mean Ozone will likely be doing many more merges and splits than 
HBase to keep the container size high. This is concerning since splits and 
merges are expensive operations, and based on HBase's experience, are hard to 
get right.
{quote}
Few valid issues 
 * High delete rate for recently written data: This might be an issue, but 
there are only a certain max number of containers in a given time, most 
containers contain data within a certain period.
 * Design of consistent container split and merge (similar to HBase region 
split and merge): This requires some kind of general barrier implementation 
(see https://issues.apache.org/jira/browse/HBASE-12439 for ProcedureV2 
implementation)

> Support Ozone container merge
> -----------------------------
>
>                 Key: HDDS-13608
>                 URL: https://issues.apache.org/jira/browse/HDDS-13608
>             Project: Apache Ozone
>          Issue Type: Wish
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> In an ideal cluster, each container will be closed when it's full (e.g. 
> nearing the 5GB size). However, in real clusters a lot of times these 
> containers are prematurely closed due to one reason or another which causes a 
> lot of small containers. Small and big containers are considered equally 
> during container replications which cause things like decommission to take a 
> longer time since it needs to replicate a lot of these small containers (so 
> things like replication limit of N might sometimes cause spike in replication 
> throughput due to differing container sizes). Large number of containers can 
> also put a high memory pressure on SCM and Datanodes (e.g. causes large 
> dentry which might cause high native memory pressure if a lot of container 
> directories are accessed at one time, this is one of the reasons why we 
> internally change to lightweight DedicatedDiskSpaceUsage instead of heavy DU).
> This is a wish to kickstart discussion on container merging. If there are 
> small CLOSED containers, we can schedule some merge operations to combine 
> them to a single container. However, there are a lot of foreseen complexities 
> since we might need to create a new container ID (or pick one) which will be 
> different in what is stored in the key location info in the OM One way is to 
> create a layer of mapping between the old container ID and the new (merged) 
> container ID when getting the block location, but this will add more 
> overheads in memory and lookup, and can cause more than 1 indirections if we 
> have more than one merging. Another possible issue would be to pick the 
> containers to merge, two containers replicas might not share the same set of 
> datanodes. Therefore, we might do some replications before merging which 
> might cause further overheads.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to