vamshikrishnakyatham opened a new pull request, #13719:
URL: https://github.com/apache/hudi/pull/13719

   ## What is the purpose of the pull request
   
   This PR implements new SQL procedures for viewing Hudi cleaning operations 
to provide better visibility into table maintenance activities:
   
   - `show_cleans`: Display completed clean operations with timing and file 
deletion statistics
   - `show_clean_plans`: Display requested clean plans and schedules with 
retention policies  
   - `show_cleans_metadata`: Display partition-level clean metadata with 
detailed statistics
   
   These procedures help users monitor cleaning performance, debug storage 
issues, and understand table maintenance operations.
   
   ## Brief change log
   
   - Implement ShowCleansProcedure for displaying completed clean operations
   - Implement ShowCleansPlanProcedure for displaying requested clean plans and 
schedules
   - Add ShowCleansPartitionMetadataProcedure for partition-level clean metadata
   - Register new procedures in HoodieProcedures.scala
   - Add comprehensive test suite TestShowCleansProcedures.scala with edge cases
   - Support limit parameter for result pagination
   - Include proper error handling for non-existent tables
   - Follow existing procedure patterns and coding standards
   
   ### Change Logs
   
   **Context:** Added three new SQL procedures to provide visibility into 
Hudi's cleaning operations, which was previously only available through CLI 
tools or direct timeline inspection.
   
   **Summary:** 
   - New `show_cleans` procedure displays completed cleaning operations with 
metadata like timing, files deleted, and retention policies
   - New `show_clean_plans` procedure shows scheduled/requested clean 
operations before execution
   - New `show_cleans_metadata` procedure provides partition-level cleaning 
details for debugging
   - All procedures follow existing Hudi procedure patterns and include 
comprehensive error handling
   - No code was copied; implementation follows existing procedure patterns in 
the codebase
   
   ### Impact
   
   **Public API Changes:**
   - Adds three new SQL procedures: `show_cleans`, `show_clean_plans`, 
`show_cleans_metadata`
   - All procedures accept optional `limit` parameter for pagination
   - New procedures are registered and available via `CALL` statements in Spark 
SQL
   
   **User-Facing Features:**
   - Users can now monitor cleaning operations directly via SQL
   - Better debugging capabilities for storage and maintenance issues
   - Consistent interface with other Hudi procedures
   
   **Performance Impact:**
   - Minimal performance impact - procedures only read timeline metadata
   - No impact on write operations or table performance
   - Procedures use existing timeline APIs with no additional I/O overhead
   
   ### Risk level (write none, low medium or high below)
   
   **Low**
   
   This is a new feature addition that only adds SQL procedures without 
modifying existing functionality. The changes are isolated, well-tested, and 
follow established patterns. No existing APIs or behaviors are modified.
   
   ### Documentation Update
   
   **Website Update Required:**
   - New SQL procedures need to be documented on Hudi website under SQL 
procedures section
   - Will create follow-up JIRA ticket for website documentation update
   - Code includes comprehensive ScalaDoc documentation for all new classes and 
methods
   
   **Config Changes:** 
   - None - no new configurations added or existing defaults changed
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to