[ 
https://issues.apache.org/jira/browse/HDDS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zita Dombi updated HDDS-6379:
-----------------------------
    Fix Version/s: 1.3.0

> Not deducting the STANDALONE pipelines when counting pipelines on each 
> datanode to check the pipeline limit.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-6379
>                 URL: https://issues.apache.org/jira/browse/HDDS-6379
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode, SCM
>    Affects Versions: 1.2.0
>            Reporter: Zita Dombi
>            Assignee: Zita Dombi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.3.0
>
>
> So I found this bug when I tried to add robot tests to the ozone debug CLI, 
> but I was able to recreate it locally. I had three datanodes and created a 
> new pipeline with the ozone admin pipeline create command, which chose a 
> datanode and made a STANDALONE/ONE pipeline with it. After that I stopped a 
> datanode and waited until it had a DEAD state; after I started it again it 
> didn't create a RATIS/THREE pipeline, even though there were three healthy 
> datanodes and no RATIS/THREE pipeline.
> In the docker-config the ozone.scm.datanode.pipeline.limit property is set to 
> 1 (the default is 2) due to the multi raft support. When we are trying to 
> create the pipeline we are making a healthy datanode list where we are 
> filtering the list based on the pipeline limit. We are calculating the 
> currect pipeline count like this on a datanode:
> {code:java}
> int currentPipelineCount(DatanodeDetails datanodeDetails, int nodesRequired) {
>     // Datanodes from pipeline in some states can also be considered available
>     // for pipeline allocation. Thus the number of these pipeline shall be
>     // deducted from total heaviness calculation.
>     int pipelineNumDeductable = 0;
>     Set<PipelineID> pipelines = nodeManager.getPipelines(datanodeDetails);
>     for (PipelineID pid : pipelines) {
>       Pipeline pipeline;
>       try {
>         pipeline = stateManager.getPipeline(pid);
>       } catch (PipelineNotFoundException e) {
>         LOG.debug("Pipeline not found in pipeline state manager during" +
>             " pipeline creation. PipelineID: {}", pid, e);
>         continue;
>       }
>       if (pipeline != null &&
>           // single node pipeline are not accounted for while determining
>           // the pipeline limit for dn
>           pipeline.getType() == HddsProtos.ReplicationType.RATIS &&
>           (RatisReplicationConfig
>               .hasFactor(pipeline.getReplicationConfig(), 
> ReplicationFactor.ONE)
>               ||
>               pipeline.getReplicationConfig().getRequiredNodes()
>                   == nodesRequired &&
>                   pipeline.getPipelineState()
>                       == Pipeline.PipelineState.CLOSED)) {
>         pipelineNumDeductable++;
>       }
>     }
>     return pipelines.size() - pipelineNumDeductable;
>   }
> {code}
> We are only deducting the RATIS replication type pipelines (due to this 
> condition: pipeline.getType() == HddsProtos.ReplicationType.RATIS), so will 
> count in the STANDALONE/ONE pipeline and because of that we will reach the 
> pipeline limit on that datanode, therefore we won't create a RATIS/THREE 
> pipeline.
> We should deduct all the single node pipelines in this check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to