[jira] [Comment Edited] (HDDS-5327) EC: WritableEcContainerProvider should dynamically adjust the open container groups

Mark Gui (Jira) Mon, 21 Mar 2022 04:38:05 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509680#comment-17509680
 ]


Mark Gui edited comment on HDDS-5327 at 3/21/22, 11:37 AM:
-----------------------------------------------------------

Hi [~sodonnell] , I'm very interested in this problem since it impacts both 
performance, data consistency and stability.

>From the descriptions, there are several points that I get:
 # A min number of pipelines to allow writes to be spread across 
containers/Disks/DNs.                                                (done)
 # A max number of pipelines to prevent too many containers open for too long.  
                                                           (todo)
 # The number of pipelines elastically grows from min to max, so different 
loads from clients could be served well.   (todo)

And EC pipelines are different from Ratis pipelines that they are light-weight 
and DNs don't have to shake their hands to form a raft group with RaftServers 
setup. And there is 1 container for 1 pipeline for EC.

 

So I've got some points:
 # For the min number of pipelines, I think it could slow down writes somehow. 
For the current implementation of min limit, when there are 5 open containers, 
we don't prefer to allocate more. Then all concurrent writes share the 5 
containers and their underlying DNs and disks until they are full as 
incrementally reported by DNs, and new containers will be created. So during 
this period, the total bandwidth will be bounded by the DNs and Disks of the 
containers, of course we could manually tune it larger for larger clusters. 
I've got some example numbers: freon ockg -n 10 -s $((10*1024*1024*1024)) -t 
10, EC: rs-3-2-1024k, pipeline.minimum=5, total time cost: {*}436s{*}, and 
pipeline.minimum=15, total time cost: {*}347s{*}. A 25% raise here, but then we 
got more open containers for the cluster. So we'd better to pick a balanced 
number here.
 # The max number of pipelines could be calculated similar to Ratis pipelines 
which is : {*}MaxNumOpenPipelinesInCluster = MaxNumOpenPipelinesPerDN * 
NumOfHealthyDNs / ReplicationConfig.getRequiredNodes(){*}. (Refer to 
RatisPipelineProvider.java). This should be reasonable: a) more DNs, more 
Pipelines b) More pipelines per DN, more pipelines in all c) more required DNs, 
less pipelines in all. Here we don't take disks into consideration since each 
DN has its own disk management policy and SCM don't cares about it. And if we 
want to consider disks then we should also consider the network bandwidth, it 
is hard to do calculations over these factors but we could offer a config item 
as Ratis does(ozone.scm.datanode.pipeline.limit(default: 2)). And for EC 
pipelines, the *MaxNumOpenPipelinesPerDN* could be larger than that for Ratis 
since Ratis pipelines are heavy but EC pipelines are light.
 # To allow pipeline grow, we should have some criteria just as you said, 
cluster load, container age, etc. And the goal is to create new pipelines 
quickly before the min pipelines are overloaded and stop growing after max is 
reached. For the current implementation, new pipelines will only be created 
when there are < 5 pipelines or there are no suitable(not enough save or 
excluded by client) pipelines for the current block to be allocated

 


was (Author: markgui):
Hi [~sodonnell] , I'm very interested in this problem since it impacts both 
performance, data consistency and stability.

>From the descriptions, there are several points that I get:
 # A min number of pipelines to allow writes to be spread across 
containers/Disks/DNs.                                                (done)
 # A max number of pipelines to prevent too many containers open for too long.  
                                                           (todo)
 # The number of pipelines elastically grows from min to max, so different 
loads from clients could be served well.   (todo)

And EC pipelines are different from Ratis pipelines that they are light-weight 
and DNs don't have to shake their hands to form a raft group with RaftServers 
setup. And there is 1 container for 1 pipeline for EC.

 

So I've got some points:
 # For the min number of pipelines, I think it could slow down writes somehow. 
For the current implementation of min limit, when there are 5 open containers, 
we don't prefer to allocate more. Then all concurrent writes share the 5 
containers and their underlying DNs and disks until they are full as 
incrementally reported by DNs, and new containers will be created. So during 
this period, the total bandwidth will be bounded by the DNs and Disks of the 
containers, of course we could manually tune it larger for larger clusters. 
I've got some example numbers: freon ockg -n 10 -s $((10*1024*1024*1024)) -t 
10, EC: rs-3-2-1024k, pipeline.minimum=5, total time cost: {*}436s{*}, and 
pipeline.minimum=15, total time cost: {*}347s{*}. A 25% raise here, but then we 
got more open containers for the cluster. So we'd better to pick a balanced 
number here.
 # The max number of pipelines could be calculated similar to Ratis pipelines 
which is : {*}MaxNumOpenPipelinesInCluster = MaxNumOpenPipelinesPerDN * 
NumOfHealthyDNs / ReplicationConfig.getRequiredNodes(){*}. (Refer to 
RatisPipelineProvider.java). This should be reasonable: a) more DNs, more 
Pipelines b) More pipelines per DN, more pipelines in all c) more required DNs, 
less pipelines in all. Here we don't take disks into consideration since each 
DN has its own disk management policy and SCM don't cares about it. And if we 
want to consider disks then we should also consider the network bandwidth, it 
is hard to do calculations over these factors but we could offer a config item 
as Ratis does(ozone.scm.datanode.pipeline.limit(default: 2)). And for EC 
pipelines, the *MaxNumOpenPipelinesPerDN* could be larger than that for Ratis 
since Ratis pipelines are heavy but EC pipelines are light.
 # To allow pipeline grow, we should have some criteria just as you said, 
cluster load, container age, etc. And the goal is to create new pipelines 
quickly before the min pipelines are overloaded and stop growing after max is 
reached.

 

> EC: WritableEcContainerProvider should dynamically adjust the open container 
> groups
> -----------------------------------------------------------------------------------
>
>                 Key: HDDS-5327
>                 URL: https://issues.apache.org/jira/browse/HDDS-5327
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Priority: Major
>
> After some discussion we concluded that for any given EC policy, a minimum 
> number of pipelines should be allocated so writes can be directed to 
> different containers.
> The absolute maximum number of pipelines can be calculated as some function 
> of the cluster nodes and disks, but we are still unsure about how to 
> calculate that limit.
> The number of pipelines should be able to grow from the minimum toward the 
> maximum, depending on the write load on the cluster, or perhaps the age of 
> the oldest open container (to prevent too many containers remaining open for 
> too long).
> The goal is to allow a sufficient number of open containers so the writes are 
> spread across different disks, without having to maintain too many open 
> containers on the cluster.
> If the write load is very high, there should be more open containers than if 
> the write load is very low.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-5327) EC: WritableEcContainerProvider should dynamically adjust the open container groups

Reply via email to