nsivabalan opened a new pull request, #19033: URL: https://github.com/apache/hudi/pull/19033
## Summary - Parallelizes `dropPartitionsToTable` in the HiveQL sync mode (`hoodie.datasource.hive_sync.mode=hiveql`). - Stacked on **#18984**. The diff currently includes both the HMS commit (#18983) and the HiveQL commit (#18984); once those merge, this PR will rebase cleanly onto master. - Reviewing the top commit `7022e7d49341` in isolation gives the DROP-only delta (~119 added / 16 removed). ## What this fixes HiveQL DROP goes through `IMetaStoreClient.dropPartition` (Thrift), not Hive Driver — so it can't reuse the `HiveDriverPool` introduced in #18984. Today it loops sequentially against the session metastore client. This PR wires the existing `IMetaStoreClientPool` from #18983 into `HiveQueryDDLExecutor` and uses it to fan drop batches across the pool's workers. Behavior: - `batching.enabled=false` (default): unchanged. `dropPartitionsToTable` iterates the partition list sequentially on the session metastore client, exactly as before. - `batching.enabled=true`: partitions split into batches of `HIVE_BATCH_SYNC_PARTITION_NUM`, batches fan out across the pool's workers (one independent `IMetaStoreClient` per worker), first-error semantics match the HMS-mode implementation (first failure thrown, subsequent suppressed at WARN). ## Configs No new configs. Reuses everything from #18983 / #18984: | Key | Default | |---|---| | `hoodie.datasource.hive_sync.batching.enabled` | `false` | | `hoodie.datasource.hive_sync.batching.threads` | `4` | | `hoodie.datasource.hive_sync.batch_num` | `1000` | ## Test plan - [x] `mvn compile` on `hudi-sync/hudi-hive-sync` — clean, 0 Checkstyle violations, 0 RAT issues - [x] `mvn test` on `hudi-sync/hudi-hive-sync` — **309 tests, 0 failures, 0 errors** (was 308 on the parent branch) - [x] `TestHiveSyncTool#testHiveQLDropPartitionsWithBatching` (new) — creates 8 partitions, drops 4 through the parallel pool path with `threads=3` and `batch_num=2` (multiple drop batches dispatched in parallel), asserts the remaining set matches. - [x] Existing 308 tests across all three sync modes pass unchanged. ## Files touched (top commit only) - `HiveQueryDDLExecutor.java` — new constructor accepting `Option<IMetaStoreClientPool>`; `dropPartitionsToTable` now batches and fans out when the pool is present, falls back to sequential single-client path otherwise. - `HoodieHiveSyncClient.java` — build `IMetaStoreClientPool` in HIVEQL branches as well (HMS branch already did so). - `TestHiveSyncTool.java` — new end-to-end test. ## Out of scope - JDBC DROP parallelization — JDBC's `constructDropPartitions` already batches by `HIVE_BATCH_SYNC_PARTITION_NUM` but runs sequentially; parallelizing it needs a JDBC `Connection` pool, tracked separately. - Benchmarks vs HDrone on a production HMS — same blocker as #18984; pinning to a follow-up after #18983 lands. Related: #18331 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
