kbuci opened a new pull request, #18123:
URL: https://github.com/apache/hudi/pull/18123

   
   **Summary:** When using the Zookeeper-based lock provider, the lock node in 
ZooKeeper now stores metadata (including application id) so lock holders can be 
identified. Application id is taken from the engine context (e.g. Spark 
application id) and passed through write config into lock config and into 
`HoodieInterProcessMutex`, which writes it into the ZK lock node.
   
   **Changelog:**
   - **hudi-common**
     - `LockConfiguration`: Added `LOCK_HOLDER_APP_ID_KEY` 
(`hoodie.write.lock.app_id`).
     - `HoodieEngineContext`: Added default `getApplicationId()` returning 
`"Unknown"`.
   - **hudi-client-common**
     - Added `HoodieInterProcessMutex`: Wraps Curator `InterProcessMutex` and 
overrides `getLockNodeBytes()` to set lock node data from `LockConfiguration` 
(including application id).
     - `BaseZookeeperBasedLockProvider`: Uses `HoodieInterProcessMutex` instead 
of `InterProcessMutex`, passing `LockConfiguration` so lock node bytes include 
app id.
     - `LockManager`: When building `LockConfiguration`, copies lock props and 
sets `LOCK_HOLDER_APP_ID_KEY` from `writeConfig.getApplicationId()` so the lock 
provider receives the app id.
     - `HoodieWriteConfig`: Added `applicationId` (default `"Unknown"`), 
`getApplicationId()`, and `setApplicationId(String)`.
     - `BaseHoodieWriteClient`: In both constructors that take 
`HoodieEngineContext`, added 
`config.setApplicationId(context.getApplicationId())` so the write config gets 
the engine’s application id for use by `LockManager`.
   - **hudi-spark-client**
     - `HoodieSparkEngineContext`: Overrode `getApplicationId()` to return 
`javaSparkContext.sc().applicationId()`.
   
   ### Impact
   
   - **User-facing:** No change to public APIs. Existing ZK lock config 
continues to work; application id defaults to `"Unknown"` if not set.
   - **Behavior:** Lock nodes created by the ZK lock provider now store 
`application_id=<value>` in the node data, so tools (e.g. zkcli) can show which 
application holds the lock. Spark users get the Spark application id 
automatically; other engines keep `"Unknown"` unless they override 
`getApplicationId()` or set it on `HoodieWriteConfig`.
   - **Performance:** Negligible (one string in lock config and in ZK node 
data).
   
   ### Risk Level
   
   **Low.** Changes are additive and backward compatible: default application 
id is `"Unknown"`, and existing ZK lock behavior is unchanged except for the 
extra metadata in the lock node. `TestHoodieInterProcessMutex` verifies 
`getLockNodeBytes()` behavior; existing ZK lock tests remain valid.
   
   ### Documentation Update
   
   None. No new user-facing config is required; `hoodie.write.lock.app_id` is 
optional and used internally when set via write config. No website or 
config-doc changes needed unless we later document this for operators.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to