sodonnel opened a new pull request, #4152:
URL: https://github.com/apache/ozone/pull/4152

   ## What changes were proposed in this pull request?
   
   ```
       "EcReplicationCmdsSentTotal" : 0,
       "EcDeletionCmdsSentTotal" : 259,
       "EcReplicationCmdsCompletedTotal" : 51,
       "EcDeletionCmdsCompletedTotal" : 51,
       "EcReconstructionCmdsSentTotal" : 571,
       "EcReplicationCmdsTimeoutTotal" : 765,
       "EcDeletionCmdsTimeoutTotal" : 204
   ```
   
   Total replication commands sent are 0, while timed out are 765.
   
   I think the code is working as intended, but it is confusing.
   
   We have a metric for "EcReplicationCmdsSentTotal" and 
EcReconstructionCmdsSentTotal. However on completion or timeout we only have a 
metric EcReplicationCmdsCompletedTotal and EcReplicationCmdsTimeoutTotal - we 
don't have a reconstruction completed / timeout. This is because we track 
completion in ContainerReplicaPendingOps, and all it sees is a replica that has 
been scheduled to be created. It doesn't know if its an simple copy or a 
reconstruction that is going to create it.
   
   That can explain why "EcReplicationCmdsSentTotal=0" and 
"EcReplicationCmdsTimeoutTotal=765" - likely all these scheduled commands were 
actually reconstructions, as we have 571 of those sent.
   
   Why then do we have more ECReplication completed and timed out than 
scheduled? An EC reconstruction can create multiple new replicas in a single 
command, and they are tracked as a single command when sent, but then when the 
commands are completed in pending ops, it counts one per replica. So we can 
schedule a reconstruction to create 2 new replicas, and we will end up with 1 
command sent and 2 in EcReplicationCmdsCompletedTotal.
   
   To make this less confusing I have renamed the "complete" metrics in this PR 
to be Replicas created / deleted / timed out, rather than commands.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7695
   
   ## How was this patch tested?
   
   Existing tests should cover this as its just a rename of variables / methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to