[
https://issues.apache.org/jira/browse/HDDS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zita Dombi updated HDDS-6379:
-----------------------------
Fix Version/s: 1.3.0
> Not deducting the STANDALONE pipelines when counting pipelines on each
> datanode to check the pipeline limit.
> ------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-6379
> URL: https://issues.apache.org/jira/browse/HDDS-6379
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode, SCM
> Affects Versions: 1.2.0
> Reporter: Zita Dombi
> Assignee: Zita Dombi
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.3.0
>
>
> So I found this bug when I tried to add robot tests to the ozone debug CLI,
> but I was able to recreate it locally. I had three datanodes and created a
> new pipeline with the ozone admin pipeline create command, which chose a
> datanode and made a STANDALONE/ONE pipeline with it. After that I stopped a
> datanode and waited until it had a DEAD state; after I started it again it
> didn't create a RATIS/THREE pipeline, even though there were three healthy
> datanodes and no RATIS/THREE pipeline.
> In the docker-config the ozone.scm.datanode.pipeline.limit property is set to
> 1 (the default is 2) due to the multi raft support. When we are trying to
> create the pipeline we are making a healthy datanode list where we are
> filtering the list based on the pipeline limit. We are calculating the
> currect pipeline count like this on a datanode:
> {code:java}
> int currentPipelineCount(DatanodeDetails datanodeDetails, int nodesRequired) {
> // Datanodes from pipeline in some states can also be considered available
> // for pipeline allocation. Thus the number of these pipeline shall be
> // deducted from total heaviness calculation.
> int pipelineNumDeductable = 0;
> Set<PipelineID> pipelines = nodeManager.getPipelines(datanodeDetails);
> for (PipelineID pid : pipelines) {
> Pipeline pipeline;
> try {
> pipeline = stateManager.getPipeline(pid);
> } catch (PipelineNotFoundException e) {
> LOG.debug("Pipeline not found in pipeline state manager during" +
> " pipeline creation. PipelineID: {}", pid, e);
> continue;
> }
> if (pipeline != null &&
> // single node pipeline are not accounted for while determining
> // the pipeline limit for dn
> pipeline.getType() == HddsProtos.ReplicationType.RATIS &&
> (RatisReplicationConfig
> .hasFactor(pipeline.getReplicationConfig(),
> ReplicationFactor.ONE)
> ||
> pipeline.getReplicationConfig().getRequiredNodes()
> == nodesRequired &&
> pipeline.getPipelineState()
> == Pipeline.PipelineState.CLOSED)) {
> pipelineNumDeductable++;
> }
> }
> return pipelines.size() - pipelineNumDeductable;
> }
> {code}
> We are only deducting the RATIS replication type pipelines (due to this
> condition: pipeline.getType() == HddsProtos.ReplicationType.RATIS), so will
> count in the STANDALONE/ONE pipeline and because of that we will reach the
> pipeline limit on that datanode, therefore we won't create a RATIS/THREE
> pipeline.
> We should deduct all the single node pipelines in this check.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]