suryaprasanna opened a new pull request, #18204:
URL: https://github.com/apache/hudi/pull/18204
### Describe the issue this Pull Request addresses
Today, running Hive sync is typically coupled to ingestion/streaming
execution paths. In operational scenarios, teams often need a lightweight way
to trigger Hive sync externally and independently (for example, after
backfills, manual data corrections, or quick metadata reconciliation).
This PR adds an external Hive sync utility job so users can execute Hive
sync on demand without embedding the logic into another pipeline execution.
### Summary and Changelog
- Added `HudiHiveSyncJob` under `hudi-utilities` as an external runner for
Hive sync.
- Added CLI/config support for base path, base file format, props file, and
override configs.
- Wired the job to build sync properties and invoke `HiveSyncTool` directly.
- Added end-to-end test `TestHudiHiveSyncJob` validating:
- writing a Hudi dataset without metastore registration,
- verifying table absence before sync,
- running `HudiHiveSyncJob`,
- verifying successful registration after sync.
### Impact
- Adds a new operational utility for manually/on-demand Hive sync execution.
- Improves maintainability by decoupling metadata synchronization from
ingestion runtime when needed.
- No change to existing ingestion behavior unless this utility is explicitly
used.
### Risk Level
low
This is an additive utility with integration coverage. Existing sync flows
remain unchanged.
### Documentation Update
none
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
Made with [Cursor](https://cursor.com)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]