devmadhuu opened a new pull request, #10384:
URL: https://github.com/apache/ozone/pull/10384
## What changes were proposed in this pull request?
This PR extends `ReconScmContainerSyncMetrics` to expose per-state metrics
for the four actively reconciled container states:
- `OPEN`
- `QUASI_CLOSED`
- `CLOSED`
- `DELETED`
For each state, Recon now reports:
- Last sync-pass duration in milliseconds.
- Last pre-sync observed container-count drift, computed as `SCM count -
Recon count`.
The existing overall targeted sync metrics remain unchanged:
- `targetedSyncStatus`
- `lastTargetedSyncDurationMs`
## Why are the changes needed?
Recon periodically syncs container state from SCM, every 6 hours by
default. Before this change, metrics only showed the overall targeted sync
status and total duration. Admins could not tell:
- Which state pass took time.
- Whether the latest cycle observed count drift for a specific state.
- Whether SCM had more or fewer containers than Recon for a given
reconciled state.
The new metrics make this visible in Hadoop metrics and downstream
Prometheus time-series data.
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15413
## How was this patch tested?
Ran the below tests:
`TestReconStorageContainerSyncHelper`, `TestReconScmContainerSyncMetrics`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]