suryaprasanna opened a new pull request, #18204:
URL: https://github.com/apache/hudi/pull/18204

   ### Describe the issue this Pull Request addresses
   
   Today, running Hive sync is typically coupled to ingestion/streaming 
execution paths. In operational scenarios, teams often need a lightweight way 
to trigger Hive sync externally and independently (for example, after 
backfills, manual data corrections, or quick metadata reconciliation).
   
   This PR adds an external Hive sync utility job so users can execute Hive 
sync on demand without embedding the logic into another pipeline execution.
   
   ### Summary and Changelog
   
   - Added `HudiHiveSyncJob` under `hudi-utilities` as an external runner for 
Hive sync.
   - Added CLI/config support for base path, base file format, props file, and 
override configs.
   - Wired the job to build sync properties and invoke `HiveSyncTool` directly.
   - Added end-to-end test `TestHudiHiveSyncJob` validating:
     - writing a Hudi dataset without metastore registration,
     - verifying table absence before sync,
     - running `HudiHiveSyncJob`,
     - verifying successful registration after sync.
   
   ### Impact
   
   - Adds a new operational utility for manually/on-demand Hive sync execution.
   - Improves maintainability by decoupling metadata synchronization from 
ingestion runtime when needed.
   - No change to existing ingestion behavior unless this utility is explicitly 
used.
   
   ### Risk Level
   
   low
   
   This is an additive utility with integration coverage. Existing sync flows 
remain unchanged.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   
   Made with [Cursor](https://cursor.com)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to