nsivabalan opened a new pull request, #19033:
URL: https://github.com/apache/hudi/pull/19033

   ## Summary
   - Parallelizes `dropPartitionsToTable` in the HiveQL sync mode 
(`hoodie.datasource.hive_sync.mode=hiveql`).
   - Stacked on **#18984**. The diff currently includes both the HMS commit 
(#18983) and the HiveQL commit (#18984); once those merge, this PR will rebase 
cleanly onto master.
   - Reviewing the top commit `7022e7d49341` in isolation gives the DROP-only 
delta (~119 added / 16 removed).
   
   ## What this fixes
   HiveQL DROP goes through `IMetaStoreClient.dropPartition` (Thrift), not Hive 
Driver — so it can't reuse the `HiveDriverPool` introduced in #18984. Today it 
loops sequentially against the session metastore client. This PR wires the 
existing `IMetaStoreClientPool` from #18983 into `HiveQueryDDLExecutor` and 
uses it to fan drop batches across the pool's workers.
   
   Behavior:
   - `batching.enabled=false` (default): unchanged. `dropPartitionsToTable` 
iterates the partition list sequentially on the session metastore client, 
exactly as before.
   - `batching.enabled=true`: partitions split into batches of 
`HIVE_BATCH_SYNC_PARTITION_NUM`, batches fan out across the pool's workers (one 
independent `IMetaStoreClient` per worker), first-error semantics match the 
HMS-mode implementation (first failure thrown, subsequent suppressed at WARN).
   
   ## Configs
   No new configs. Reuses everything from #18983 / #18984:
   
   | Key | Default |
   |---|---|
   | `hoodie.datasource.hive_sync.batching.enabled` | `false` |
   | `hoodie.datasource.hive_sync.batching.threads` | `4` |
   | `hoodie.datasource.hive_sync.batch_num` | `1000` |
   
   ## Test plan
   - [x] `mvn compile` on `hudi-sync/hudi-hive-sync` — clean, 0 Checkstyle 
violations, 0 RAT issues
   - [x] `mvn test` on `hudi-sync/hudi-hive-sync` — **309 tests, 0 failures, 0 
errors** (was 308 on the parent branch)
   - [x] `TestHiveSyncTool#testHiveQLDropPartitionsWithBatching` (new) — 
creates 8 partitions, drops 4 through the parallel pool path with `threads=3` 
and `batch_num=2` (multiple drop batches dispatched in parallel), asserts the 
remaining set matches.
   - [x] Existing 308 tests across all three sync modes pass unchanged.
   
   ## Files touched (top commit only)
   - `HiveQueryDDLExecutor.java` — new constructor accepting 
`Option<IMetaStoreClientPool>`; `dropPartitionsToTable` now batches and fans 
out when the pool is present, falls back to sequential single-client path 
otherwise.
   - `HoodieHiveSyncClient.java` — build `IMetaStoreClientPool` in HIVEQL 
branches as well (HMS branch already did so).
   - `TestHiveSyncTool.java` — new end-to-end test.
   
   ## Out of scope
   - JDBC DROP parallelization — JDBC's `constructDropPartitions` already 
batches by `HIVE_BATCH_SYNC_PARTITION_NUM` but runs sequentially; parallelizing 
it needs a JDBC `Connection` pool, tracked separately.
   - Benchmarks vs HDrone on a production HMS — same blocker as #18984; pinning 
to a follow-up after #18983 lands.
   
   Related: #18331
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to