Zhengcy05 opened a new pull request, #16329:
URL: https://github.com/apache/dubbo/pull/16329
Fixes #16186
## What is the purpose of the change?
This PR fixes an issue in the multi-ZooKeeper registry scenario when one
registry is unavailable and configured with `check=false`.
Based on the issue report, in the `dubbo 3.3` branch, if multiple ZooKeeper
registries are configured and one of them is not started with `check=false`,
`Curator5ZookeeperClient` may keep an unconnected Curator client alive after
`blockUntilConnected(...)` returns `false`. As a result, the following registry
workflow still treats that client as usable and continues trying to create
nodes on it. This can cause repeated failures in node creation, significantly
slow down application startup, and eventually lead to startup failure.
From the call path, the problem has two parts:
1. `Curator5ZookeeperClient` does not fail fast enough on connection failure
If `blockUntilConnected(...)` returns `false`, the ZooKeeper client has
not been successfully established. This should always be treated as a client
creation failure, and the underlying Curator client should be closed
immediately so it cannot be reused later.
The `check=false` semantic should only control whether the upper layer
continues to throw when registry creation fails. It should not change whether
the low-level ZooKeeper client itself is considered successfully created.
2. `ListenerRegistryWrapper` is not null-safe when the upper layer returns a
`null` registry
In `AbstractRegistryFactory#getRegistry()`, when registry creation fails
and `check=false`, Dubbo logs a warning and returns `null`.
However, `RegistryFactoryWrapper` still wraps that result, and methods
such as `ListenerRegistryWrapper#getUrl()` previously dereferenced the delegate
directly. This could later trigger a `NullPointerException` in paths like
`RegistryDirectory#subscribe()`.
To address this, this PR includes two parts:
- Fix the connection failure handling in `Curator5ZookeeperClient`
- When `blockUntilConnected(...) == false`, always close the client first
and then throw an exception.
- Remove the dependency on `check` in this low-level failure path.
- Keep the client creation semantic consistent: if the connection is not
established, client creation fails.
- Add null-safe handling to `ListenerRegistryWrapper`
- Add a constructor that preserves the original `URL`.
- Return the original URL from `getUrl()` when `registry == null`.
- Return `false` from `isAvailable()` when `registry == null`.
- Make `destroy()` a safe no-op when `registry == null`.
- Add null protection to `unsubscribe()`, `isServiceDiscovery()`, and
`lookup()`.
- Keep the existing listener callback behavior unchanged.
With these changes:
- an unconnected ZooKeeper client will not be reused by the following
registry workflow;
- the existing `check=false` behavior at the registry factory layer remains
unchanged;
- a `null` registry can still be safely wrapped without introducing extra
NPEs.
## Tests
This PR adds focused regression coverage for both the low-level client
behavior and the upper-layer `check=false` flow:
- `Curator5ZookeeperClientTest`
- verifies that when `blockUntilConnected(...)` fails, constructing
`Curator5ZookeeperClient` throws `IllegalStateException` for both `check=true`
and `check=false`;
- verifies that `client.close()` is invoked on connection failure.
- `AbstractRegistryFactoryTest`
- verifies that when `createRegistry(url)` fails and `check=false`,
`getRegistry(url)` returns `null`, preserving the existing upper-layer behavior.
- `ListenerRegistryWrapperTest`
- verifies that `getUrl()`, `isAvailable()`, `destroy()`, `lookup()`,
`subscribe()`, and `unsubscribe()` are safe when `registry == null`;
- verifies that listener callbacks are still triggered as expected in the
`null registry` case.
- `RegistryFactoryWrapperTest`
- verifies that when the underlying `RegistryFactory#getRegistry(url)`
returns `null`, the wrapper still returns the original URL safely and allows
subscription-related calls without throwing exceptions.
Verification command used:
```bash
mvn -pl
dubbo-remoting/dubbo-remoting-zookeeper-curator5,dubbo-registry/dubbo-registry-api
\
-Dtest=Curator5ZookeeperClientTest,ListenerRegistryWrapperTest,RegistryFactoryWrapperTest,AbstractRegistryFactoryTest
\
-Dsurefire.failIfNoSpecifiedTests=false test -q
```
## Checklist
- [x] Make sure there is a
[GitHub_issue](https://github.com/apache/dubbo/issues) field for the change.
- [x] Write a pull request description that is detailed enough to understand
what the pull request does, how, and why.
- [x] Write necessary unit-test to verify your logic correction. If the new
feature or significant change is committed, please remember to add sample in
[dubbo samples](https://github.com/apache/dubbo-samples) project.
- [x] Make sure gitHub actions can pass. [Why the workflow is failing and
how to fix it?](../CONTRIBUTING.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]