xBis7 opened a new pull request, #5726:
URL: https://github.com/apache/ozone/pull/5726

   ## What changes were proposed in this pull request?
   
   When a node is decommissioning, new replicas are being copied to other nodes 
and once this process has finished, then the node goes into decommission. After 
the copies are made, the container appears as mis-replicated due to the 
excessive replicas. These replicas are unavailable and the decommissioned node 
is expected to be stopped. For that reason, containers that belong to 
decommissioning or decommissioned nodes, shouldn't be counted as mis-replicated.
   
   Nodes in maintenance, won't be filtered because such nodes are expected to 
come back and no replica copies are made while entering the state. 
Mis-replication on a node in maintenance, is the same as having mis-replication 
on a healthy and active node.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-9683
   
   ## How was this patch tested?
   
   New unit tests are added. It can also be tested manually with the 
`ozone-topology` docker env like so
   
   * Edit `docker-config` to enable RackScatter policy (easier to reproduce 
with RackScatter)
     * ```diff
       - 
OZONE-SITE.XML_ozone.scm.container.placement.impl=org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware
       + 
#OZONE-SITE.XML_ozone.scm.container.placement.impl=org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware
       +
       + 
OZONE-SITE.XML_ozone.scm.container.placement.impl=org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackScatter
       + 
OZONE-SITE.XML_ozone.scm.pipeline.placement.impl=org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackScatter
       +
       + # For decommission
       + OZONE-SITE.XML_ozone.scm.nodes.scmservice=scm
       + OZONE-SITE.XML_ozone.scm.address.scmservice.scm=scm
       +
       + # Expedite the container replication checking
       + OZONE-SITE.XML_hdds.scm.replication.thread.interval=15s
       ```
   * Edit `network-config`
     * ```diff
       - 10.5.0.6       /rack1
       + 10.5.0.6       /rack2
       10.5.0.7 /rack2
       - 10.5.0.8       /rack2
       + 10.5.0.8       /rack3
       - 10.5.0.9       /rack2
       + 10.5.0.9       /rack3
       ```
   * Create a key with replication Ratis THREE
     * `ozone sh key put /vol1/bucket1/key1 /etc/hosts -t=RATIS -r=THREE`
   * Find the replicas for container 1 and decommission one of the replica nodes
     * ```
        bash-4.2$ ozone admin container info 1
            get 1 datanode
        bash-4.2$ ozone admin scm roles
            copy SCM IP
        bash-4.2$ ozone admin datanode list
            copy datanode IP
        bash-4.2$ ozone admin datanode decommission -id=scmservice 
--scm=172.23.0.2:9894 172.23.0.8/ozone-datanode-2.ozone_default
        Started decommissioning datanode(s):
        172.23.0.8/ozone-datanode-2.ozone_default
        ```
   * Once the node is decommissioned, check the SCM container report `ozone 
admin container report` and check Recon container page.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to