devjue opened a new pull request, #16345:
URL: https://github.com/apache/dubbo/pull/16345
## What is the purpose of the change
Fixes #16344.
Eliminate the single-request failure window that occurs every time a Dubbo
Triple Consumer receives an HTTP/2 `GOAWAY(errorCode=0, lastStreamId=MAX_INT)`
frame from gateways / reverse proxies / providers that gracefully rotate
connections (e.g. `max_requests_per_connection`, `idle_timeout`, hot-restart
drain).
Previously, `AbstractNettyConnectionClient#onGoaway` nulled the channel
reference immediately and `NettyConnectionHandler#onGoAway` then scheduled a
reconnect, leaving a brief window in which
`AbstractClusterInvoker#checkInvokers` throws `RpcException` before
FailoverCluster's retry loop — so `retries=N` did NOT mitigate the issue. This
PR implements graceful migration: keep the old channel serving until a new
channel is ready, then atomically swap.
## Brief changelog
- **`NettyConnectionHandler#onGoAway`**: keep old channel alive; schedule
`attemptGracefulMigration` on `connectivityExecutor` after
`GRACEFUL_RECONNECT_DELAY_MS = 200ms`; one retry on failure, then fall back to
`AbstractNettyConnectionClient#onGoaway`.
- **`NettyConnectionClient#initBootstrap`**: `closeFuture` listener now
calls `compareAndClearNettyChannel(ch)` instead of `clearNettyChannel()` —
prevents the old channel's close listener from wiping a freshly swapped-in new
channel.
- **`AbstractNettyConnectionClient`**: add `compareAndClearNettyChannel`,
plus package-private `getConnectivityExecutor()` / `getReconnectDuration()`
accessors.
- **`NettyChannel#getChannelIfPresent`**: lookup-only API used by
`channelInactive` to avoid allocating a transient `NettyChannel` purely for
logging.
- **`TripleGoAwayHandler`**: log `errorCode` and `lastStreamId` so the
GOAWAY trigger is observable.
## Verifying this change
Validated against a local Triple demo with sustained 1 QPS traffic:
- 26 GOAWAY frames received → 100% graceful migration success → **zero** `No
provider available` errors and **zero** request failures.
- Channel rotation chain (first 18 migrations):
`:45956 → :45110 → :53222 → :58366 → :56372 → :55436 → :55188 → :52926 →
:50682 → :48682 → :47626 → :44014 → :43116 → :45044 → :48602 → :51880 → :50992
→ :51672`
- No regression in existing `dubbo-remoting-netty4` and `dubbo-rpc-triple`
unit tests.
## Does this pull request potentially affect one of the following parts
- [ ] Dependencies (does it add or upgrade a dependency)
- [ ] The public API
- [x] The runtime per-connection behavior (HTTP/2 GOAWAY handling on the
Consumer side)
- [ ] The persistence format of the configurations
- [ ] The default values of configurations
- [ ] The threading model (uses existing `connectivityExecutor`; does NOT
introduce new threads)
- [ ] The serialization protocol
- [ ] The compatibility with previous versions
Backward-compatible: defaults are unchanged; the only observable difference
is that GOAWAY frames no longer cause a request failure window.
## Documentation
- Does this pull request introduce a new feature? **No** (bug fix)
- If yes, how is the feature documented? **Not applicable**
Happy to backport to `3.2` once this lands on `3.3`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]