[
https://issues.apache.org/jira/browse/HDDS-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509680#comment-17509680
]
Mark Gui edited comment on HDDS-5327 at 3/21/22, 11:37 AM:
-----------------------------------------------------------
Hi [~sodonnell] , I'm very interested in this problem since it impacts both
performance, data consistency and stability.
>From the descriptions, there are several points that I get:
# A min number of pipelines to allow writes to be spread across
containers/Disks/DNs. (done)
# A max number of pipelines to prevent too many containers open for too long.
(todo)
# The number of pipelines elastically grows from min to max, so different
loads from clients could be served well. (todo)
And EC pipelines are different from Ratis pipelines that they are light-weight
and DNs don't have to shake their hands to form a raft group with RaftServers
setup. And there is 1 container for 1 pipeline for EC.
So I've got some points:
# For the min number of pipelines, I think it could slow down writes somehow.
For the current implementation of min limit, when there are 5 open containers,
we don't prefer to allocate more. Then all concurrent writes share the 5
containers and their underlying DNs and disks until they are full as
incrementally reported by DNs, and new containers will be created. So during
this period, the total bandwidth will be bounded by the DNs and Disks of the
containers, of course we could manually tune it larger for larger clusters.
I've got some example numbers: freon ockg -n 10 -s $((10*1024*1024*1024)) -t
10, EC: rs-3-2-1024k, pipeline.minimum=5, total time cost: {*}436s{*}, and
pipeline.minimum=15, total time cost: {*}347s{*}. A 25% raise here, but then we
got more open containers for the cluster. So we'd better to pick a balanced
number here.
# The max number of pipelines could be calculated similar to Ratis pipelines
which is : {*}MaxNumOpenPipelinesInCluster = MaxNumOpenPipelinesPerDN *
NumOfHealthyDNs / ReplicationConfig.getRequiredNodes(){*}. (Refer to
RatisPipelineProvider.java). This should be reasonable: a) more DNs, more
Pipelines b) More pipelines per DN, more pipelines in all c) more required DNs,
less pipelines in all. Here we don't take disks into consideration since each
DN has its own disk management policy and SCM don't cares about it. And if we
want to consider disks then we should also consider the network bandwidth, it
is hard to do calculations over these factors but we could offer a config item
as Ratis does(ozone.scm.datanode.pipeline.limit(default: 2)). And for EC
pipelines, the *MaxNumOpenPipelinesPerDN* could be larger than that for Ratis
since Ratis pipelines are heavy but EC pipelines are light.
# To allow pipeline grow, we should have some criteria just as you said,
cluster load, container age, etc. And the goal is to create new pipelines
quickly before the min pipelines are overloaded and stop growing after max is
reached. For the current implementation, new pipelines will only be created
when there are < 5 pipelines or there are no suitable(not enough save or
excluded by client) pipelines for the current block to be allocated
was (Author: markgui):
Hi [~sodonnell] , I'm very interested in this problem since it impacts both
performance, data consistency and stability.
>From the descriptions, there are several points that I get:
# A min number of pipelines to allow writes to be spread across
containers/Disks/DNs. (done)
# A max number of pipelines to prevent too many containers open for too long.
(todo)
# The number of pipelines elastically grows from min to max, so different
loads from clients could be served well. (todo)
And EC pipelines are different from Ratis pipelines that they are light-weight
and DNs don't have to shake their hands to form a raft group with RaftServers
setup. And there is 1 container for 1 pipeline for EC.
So I've got some points:
# For the min number of pipelines, I think it could slow down writes somehow.
For the current implementation of min limit, when there are 5 open containers,
we don't prefer to allocate more. Then all concurrent writes share the 5
containers and their underlying DNs and disks until they are full as
incrementally reported by DNs, and new containers will be created. So during
this period, the total bandwidth will be bounded by the DNs and Disks of the
containers, of course we could manually tune it larger for larger clusters.
I've got some example numbers: freon ockg -n 10 -s $((10*1024*1024*1024)) -t
10, EC: rs-3-2-1024k, pipeline.minimum=5, total time cost: {*}436s{*}, and
pipeline.minimum=15, total time cost: {*}347s{*}. A 25% raise here, but then we
got more open containers for the cluster. So we'd better to pick a balanced
number here.
# The max number of pipelines could be calculated similar to Ratis pipelines
which is : {*}MaxNumOpenPipelinesInCluster = MaxNumOpenPipelinesPerDN *
NumOfHealthyDNs / ReplicationConfig.getRequiredNodes(){*}. (Refer to
RatisPipelineProvider.java). This should be reasonable: a) more DNs, more
Pipelines b) More pipelines per DN, more pipelines in all c) more required DNs,
less pipelines in all. Here we don't take disks into consideration since each
DN has its own disk management policy and SCM don't cares about it. And if we
want to consider disks then we should also consider the network bandwidth, it
is hard to do calculations over these factors but we could offer a config item
as Ratis does(ozone.scm.datanode.pipeline.limit(default: 2)). And for EC
pipelines, the *MaxNumOpenPipelinesPerDN* could be larger than that for Ratis
since Ratis pipelines are heavy but EC pipelines are light.
# To allow pipeline grow, we should have some criteria just as you said,
cluster load, container age, etc. And the goal is to create new pipelines
quickly before the min pipelines are overloaded and stop growing after max is
reached.
> EC: WritableEcContainerProvider should dynamically adjust the open container
> groups
> -----------------------------------------------------------------------------------
>
> Key: HDDS-5327
> URL: https://issues.apache.org/jira/browse/HDDS-5327
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: SCM
> Reporter: Stephen O'Donnell
> Priority: Major
>
> After some discussion we concluded that for any given EC policy, a minimum
> number of pipelines should be allocated so writes can be directed to
> different containers.
> The absolute maximum number of pipelines can be calculated as some function
> of the cluster nodes and disks, but we are still unsure about how to
> calculate that limit.
> The number of pipelines should be able to grow from the minimum toward the
> maximum, depending on the write load on the cluster, or perhaps the age of
> the oldest open container (to prevent too many containers remaining open for
> too long).
> The goal is to allow a sufficient number of open containers so the writes are
> spread across different disks, without having to maintain too many open
> containers on the cluster.
> If the write load is very high, there should be more open containers than if
> the write load is very low.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]