suryaprasanna opened a new pull request, #18203:
URL: https://github.com/apache/hudi/pull/18203
### Describe the issue this Pull Request addresses
Hive sync in Spark-based environments can fail or depend on Hive
metastore/thrift classes that are not always available or compatible at
runtime. This causes sync instability even when the table lifecycle is managed
through Spark SQL catalog APIs.
This change enables Hive sync to use a Spark-catalog-backed
`IMetaStoreClient`, so metadata operations (table/partition/schema updates) can
run reliably in Hive-on-Spark setups without requiring a fully functional
external HMS client path.
### Summary and Changelog
- Added `SparkCatalogMetaStoreClient` that implements `IMetaStoreClient`
using Spark external catalog APIs for supported operations.
- Added `hoodie.datasource.hive_sync.use_spark_catalog` config and wired it
through Hive sync config plumbing.
- Updated `HoodieHiveSyncClient` to instantiate the Spark-catalog metastore
client when the new config is enabled.
- Added end-to-end Spark catalog sync tests in `TestSparkCatalogSync` for:
- initial table and partition registration,
- new partition registration after append writes,
- partition drop visibility,
- schema evolution visibility in catalog.
- Included follow-up fixes to make the Spark-catalog client compatible with
Hive sync metadata updates in test/runtime flows.
### Impact
- New optional behavior gated by
`hoodie.datasource.hive_sync.use_spark_catalog` (default remains unchanged).
- Improves reliability of Hive sync for Spark environments where direct
HMS/thrift dependencies are unavailable or fragile.
- No behavior change for existing users unless the new config is explicitly
enabled.
### Risk Level
low
The new path is opt-in and covered by end-to-end tests for partition
lifecycle and schema evolution. Default sync path is unchanged.
### Documentation Update
Config-level documentation is included in code for the new
`hoodie.datasource.hive_sync.use_spark_catalog` option. No additional
website/doc update is required for this internal sync-path enhancement.
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
Made with [Cursor](https://cursor.com)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]