sodonnel opened a new pull request #2975:
URL: https://github.com/apache/ozone/pull/2975


   ## What changes were proposed in this pull request?
   
   Replication Manager processes all the containers in SCM periodically, and so 
has a view of the health of the system. However it does not count up the 
replicas in each state and hence that overview of the system health is not 
easily visible.
   
   This Jira adds a ReplicationManagerReport object, which ReplicationManager 
can populate by incrementing various counters as it processes the containers. 
The report allows the number of containers in each lifecycle state to be 
counted, while also counting the number of containers in various health states, 
eg under replicated, over replicated etc. The report also allows a sample of 
the container IDs in the state are stored in the report (max of 100 per state) 
and these can be extracted later for debugging (a later Jira will provide the 
extra feature, probably via an Ozone Admin command).
   
   The report is integrated with the ReplicationManagerMetrics class, and will 
add metrics like the following:
   
   ```
   {
       "name" : 
"Hadoop:service=StorageContainerManager,name=ReplicationManagerMetrics",
       "modelerType" : "ReplicationManagerMetrics",
       "tag.Hostname" : "3212ea4aecc5",
       "InflightReplication" : 0,
       "InflightDeletion" : 0,
       "InflightMove" : 0,
   
   ## New metrics from here
       "NumOpenContainers" : 1,
       "NumClosingContainers" : 0,
       "NumQuasiClosedContainers" : 2,
       "NumClosedContainers" : 0,
       "NumDeletingContainers" : 0,
       "NumDeletedContainers" : 0,
       "NumUnderReplicatedContainers" : 0,
       "NumMisReplicatedContainers" : 0,
       "NumOverReplicatedContainers" : 0,
       "NumMissingContainers" : 0,
       "NumUnhealthyContainers" : 0,
       "NumEmptyContainers" : 0,
       "NumOpenUnhealthyContainers" : 0,
       "NumStuckQuasiClosedContainers" : 0,
   ## end of new metrics    
       "NumReplicationCmdsSent" : 0,
       "NumReplicationCmdsCompleted" : 0,
       "NumReplicationCmdsTimeout" : 0,
       "NumDeletionCmdsSent" : 0,
       "NumDeletionCmdsCompleted" : 0,
       "NumDeletionCmdsTimeout" : 0,
       "NumReplicationBytesTotal" : 0,
       "NumReplicationBytesCompleted" : 0,
       "NumDeletionBytesTotal" : 0,
       "NumDeletionBytesCompleted" : 0,
       "ReplicationTimeNumOps" : 0,
       "ReplicationTimeAvgTime" : 0.0,
       "DeletionTimeNumOps" : 0,
       "DeletionTimeAvgTime" : 0.0
     }
   ```
   
   Note that some of the above metrics (the ones for container LifeCycle state) 
duplicate others available from a different source in SCM:
   
   ```
   {
       "name" : 
"Hadoop:service=StorageContainerManager,name=SCMContainerMetrics",
       "modelerType" : "SCMContainerMetrics",
       "tag.Hostname" : "92503dfbd5db",
       "OpenContainers" : 0,
       "ClosingContainers" : 0,
       "QuasiClosedContainers" : 0,
       "ClosedContainers" : 0,
       "DeletingContainers" : 0,
       "DeletedContainers" : 0,
       "TotalContainers" : 0
     }
   ```  
   
   I am open to removing these duplicates in the new ReplicationManager 
metrics, but it may be useful to keep them, as the ReplicationManager counts 
are captured at a point in time, and they are calculated differently, and hence 
may be helpful in debugging some problems. For that reason, it is still good to 
capture them in the Report object, but it is debatable on whether they should 
be in the metrics or not.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6170
   
   ## How was this patch tested?
   
   New unit tests and validated in docker-compose environment.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to