vamshikrishnakyatham opened a new pull request, #13719: URL: https://github.com/apache/hudi/pull/13719
## What is the purpose of the pull request This PR implements new SQL procedures for viewing Hudi cleaning operations to provide better visibility into table maintenance activities: - `show_cleans`: Display completed clean operations with timing and file deletion statistics - `show_clean_plans`: Display requested clean plans and schedules with retention policies - `show_cleans_metadata`: Display partition-level clean metadata with detailed statistics These procedures help users monitor cleaning performance, debug storage issues, and understand table maintenance operations. ## Brief change log - Implement ShowCleansProcedure for displaying completed clean operations - Implement ShowCleansPlanProcedure for displaying requested clean plans and schedules - Add ShowCleansPartitionMetadataProcedure for partition-level clean metadata - Register new procedures in HoodieProcedures.scala - Add comprehensive test suite TestShowCleansProcedures.scala with edge cases - Support limit parameter for result pagination - Include proper error handling for non-existent tables - Follow existing procedure patterns and coding standards ### Change Logs **Context:** Added three new SQL procedures to provide visibility into Hudi's cleaning operations, which was previously only available through CLI tools or direct timeline inspection. **Summary:** - New `show_cleans` procedure displays completed cleaning operations with metadata like timing, files deleted, and retention policies - New `show_clean_plans` procedure shows scheduled/requested clean operations before execution - New `show_cleans_metadata` procedure provides partition-level cleaning details for debugging - All procedures follow existing Hudi procedure patterns and include comprehensive error handling - No code was copied; implementation follows existing procedure patterns in the codebase ### Impact **Public API Changes:** - Adds three new SQL procedures: `show_cleans`, `show_clean_plans`, `show_cleans_metadata` - All procedures accept optional `limit` parameter for pagination - New procedures are registered and available via `CALL` statements in Spark SQL **User-Facing Features:** - Users can now monitor cleaning operations directly via SQL - Better debugging capabilities for storage and maintenance issues - Consistent interface with other Hudi procedures **Performance Impact:** - Minimal performance impact - procedures only read timeline metadata - No impact on write operations or table performance - Procedures use existing timeline APIs with no additional I/O overhead ### Risk level (write none, low medium or high below) **Low** This is a new feature addition that only adds SQL procedures without modifying existing functionality. The changes are isolated, well-tested, and follow established patterns. No existing APIs or behaviors are modified. ### Documentation Update **Website Update Required:** - New SQL procedures need to be documented on Hudi website under SQL procedures section - Will create follow-up JIRA ticket for website documentation update - Code includes comprehensive ScalaDoc documentation for all new classes and methods **Config Changes:** - None - no new configurations added or existing defaults changed ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
