soulmem opened a new issue, #17527:
URL: https://github.com/apache/dolphinscheduler/issues/17527

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   When starting the master service in DolphinScheduler 3.3.1, the log shows an 
error while cleaning historical failover finished nodes. The error message 
indicates that the ZooKeeper path is invalid (not starting with /).
   Although the master service continues to start successfully and the cluster 
works normally, the startup logs contain repeated error messages like this:
   `java.lang.IllegalArgumentException: Path must start with / character
       at org.apache.curator...`
   `Caused by: 
org.apache.dolphinscheduler.registry.exception.RegistryException: zookeeper get 
data error`
   
   
   ### What you expected to happen
   
   **Expected Behavior**
   The master should clean expired failover finished nodes correctly without 
throwing exceptions.
   **Actual Behavior**
   
   - The method `cleanHistoryFailoverFinishedNodes()` calls:
   
   `final Collection<String> failoverFinishedNodes =`
   `    
registry.children(RegistryNodeType.FAILOVER_FINISH_NODES.getRegistryPath());`
   `for (final String failoverFinishedNode : failoverFinishedNodes) {`
   `    final String failoverFinishTime = registry.get(failoverFinishedNode);`
   `    ...`
   `}`
   
   - However, `registry.children()` returns child node names (e.g. 
20250921000123), not absolute paths.
   
   - Passing these directly to `registry.get()` results in an invalid ZooKeeper 
path (missing leading /).
   
   **Root Cause**
   failoverFinishedNode should be combined with the parent path 
(/nodes/failover-finish-nodes) before being passed to `registry.get()` and 
`registry.delete()`. Currently, only the child node name is used, which causes 
Curator to throw IllegalArgumentException: Path must start with / character.
   **Workaround / Verification**
   
   - We patched the code to join the parent path with the child name before 
calling `get()` and `delete()`.
   
   - After applying this fix, the master starts without errors and failover 
finished nodes are cleaned as expected.
   
   **Suggestion**
   Update cleanHistoryFailoverFinishedNodes() to build the full path before 
accessing ZooKeeper nodes. For example:
   `String parent = RegistryNodeType.FAILOVER_FINISH_NODES.getRegistryPath();`
   `for (String child : registry.children(parent)) {`
   `    String fullPath = parent + "/" + child;`
   `    String ts = registry.get(fullPath);`
   `    ...`
   `    registry.delete(fullPath);`
   `}`
   
   ### How to reproduce
   
   1. Steps to Reproduce
   2. Deploy a DolphinScheduler 3.3.1 cluster with ZooKeeper as registry.
   3. Start the master service.
   4. Check the startup log.
   5. See the error when cleanHistoryFailoverFinishedNodes() is executed.
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.3.1
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to