nada-attia opened a new pull request, #18133: URL: https://github.com/apache/hudi/pull/18133
### Describe the issue this Pull Request addresses This PR adds comprehensive metrics collection during timeline archival operations to enable better monitoring and debugging of archival processes in production environments. This is particularly important for detecting and diagnosing OOM failures and other archival-related issues. ### Summary and Changelog **Summary:** Adds detailed metrics emission during timeline archival to track operation success, failure types, and commit counts. **Changelog:** - Added `ArchivalMetrics` utility class with metric name constants - Enhanced `TimelineArchiverV1` and `TimelineArchiverV2` to collect metrics during archival - Added `getMetrics()` method to `HoodieTimelineArchiver` interface - Implemented OOM-specific failure detection and tracking - Added general exception tracking with exception class names - Added tracking for count of commits being archived (all commits and write commits) - Added archival operation status metric (success/failure) - Enhanced `HoodieMetrics.updateArchivalMetrics()` to accept map of metrics - Integrated metrics collection into `BaseHoodieTableServiceClient.archive()` ### Impact **Public API Changes:** - Added new method `getMetrics()` to `HoodieTimelineArchiver` interface (default implementation returns empty map for backward compatibility) - Added new overloaded method `updateArchivalMetrics(Map<String, Long>)` to `HoodieMetrics` class **User-Facing Changes:** Users will now have access to the following new metrics emitted during archival operations: - `archivalOutOfMemory`: Tracks OOM failures during archival - `archivalFailure.<ExceptionClassName>`: Tracks failures by exception type - `archivalNumAllCommits`: Count of all commits being archived - `archivalNumWriteCommits`: Count of write commits being archived - `archivalStatus`: 1 for success, -1 for failure ### Risk Level **Low** The changes are additive and don't modify existing behavior: - The new `getMetrics()` method has a default implementation returning empty map - Metrics collection happens in try-catch blocks and won't affect archival logic - Existing metrics continue to work as before - The changes are backward compatible ### Documentation Update None - This is an internal metrics enhancement. The new metrics are self-explanatory and follow existing Hudi metrics naming conventions. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
