nada-attia opened a new pull request, #18133:
URL: https://github.com/apache/hudi/pull/18133

   ### Describe the issue this Pull Request addresses
   
   This PR adds comprehensive metrics collection during timeline archival 
operations to enable better monitoring and debugging of archival processes in 
production environments. This is particularly important for detecting and 
diagnosing OOM failures and other archival-related issues.
   
   ### Summary and Changelog
   
   **Summary:** Adds detailed metrics emission during timeline archival to 
track operation success, failure types, and commit counts.
   
   **Changelog:**
   - Added `ArchivalMetrics` utility class with metric name constants
   - Enhanced `TimelineArchiverV1` and `TimelineArchiverV2` to collect metrics 
during archival
   - Added `getMetrics()` method to `HoodieTimelineArchiver` interface
   - Implemented OOM-specific failure detection and tracking
   - Added general exception tracking with exception class names
   - Added tracking for count of commits being archived (all commits and write 
commits)
   - Added archival operation status metric (success/failure)
   - Enhanced `HoodieMetrics.updateArchivalMetrics()` to accept map of metrics
   - Integrated metrics collection into `BaseHoodieTableServiceClient.archive()`
   
   ### Impact
   
   **Public API Changes:** 
   - Added new method `getMetrics()` to `HoodieTimelineArchiver` interface 
(default implementation returns empty map for backward compatibility)
   - Added new overloaded method `updateArchivalMetrics(Map<String, Long>)` to 
`HoodieMetrics` class
   
   **User-Facing Changes:**
   Users will now have access to the following new metrics emitted during 
archival operations:
   - `archivalOutOfMemory`: Tracks OOM failures during archival
   - `archivalFailure.<ExceptionClassName>`: Tracks failures by exception type
   - `archivalNumAllCommits`: Count of all commits being archived
   - `archivalNumWriteCommits`: Count of write commits being archived
   - `archivalStatus`: 1 for success, -1 for failure
   
   ### Risk Level
   
   **Low**
   
   The changes are additive and don't modify existing behavior:
   - The new `getMetrics()` method has a default implementation returning empty 
map
   - Metrics collection happens in try-catch blocks and won't affect archival 
logic
   - Existing metrics continue to work as before
   - The changes are backward compatible
   
   ### Documentation Update
   
   None - This is an internal metrics enhancement. The new metrics are 
self-explanatory and follow existing Hudi metrics naming conventions.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to