devjue opened a new pull request, #16345:
URL: https://github.com/apache/dubbo/pull/16345

   ## What is the purpose of the change
   
   Fixes #16344.
   
   Eliminate the single-request failure window that occurs every time a Dubbo 
Triple Consumer receives an HTTP/2 `GOAWAY(errorCode=0, lastStreamId=MAX_INT)` 
frame from gateways / reverse proxies / providers that gracefully rotate 
connections (e.g. `max_requests_per_connection`, `idle_timeout`, hot-restart 
drain).
   
   Previously, `AbstractNettyConnectionClient#onGoaway` nulled the channel 
reference immediately and `NettyConnectionHandler#onGoAway` then scheduled a 
reconnect, leaving a brief window in which 
`AbstractClusterInvoker#checkInvokers` throws `RpcException` before 
FailoverCluster's retry loop — so `retries=N` did NOT mitigate the issue. This 
PR implements graceful migration: keep the old channel serving until a new 
channel is ready, then atomically swap.
   
   ## Brief changelog
   
   - **`NettyConnectionHandler#onGoAway`**: keep old channel alive; schedule 
`attemptGracefulMigration` on `connectivityExecutor` after 
`GRACEFUL_RECONNECT_DELAY_MS = 200ms`; one retry on failure, then fall back to 
`AbstractNettyConnectionClient#onGoaway`.
   - **`NettyConnectionClient#initBootstrap`**: `closeFuture` listener now 
calls `compareAndClearNettyChannel(ch)` instead of `clearNettyChannel()` — 
prevents the old channel's close listener from wiping a freshly swapped-in new 
channel.
   - **`AbstractNettyConnectionClient`**: add `compareAndClearNettyChannel`, 
plus package-private `getConnectivityExecutor()` / `getReconnectDuration()` 
accessors.
   - **`NettyChannel#getChannelIfPresent`**: lookup-only API used by 
`channelInactive` to avoid allocating a transient `NettyChannel` purely for 
logging.
   - **`TripleGoAwayHandler`**: log `errorCode` and `lastStreamId` so the 
GOAWAY trigger is observable.
   
   ## Verifying this change
   
   Validated against a local Triple demo with sustained 1 QPS traffic:
   
   - 26 GOAWAY frames received → 100% graceful migration success → **zero** `No 
provider available` errors and **zero** request failures.
   - Channel rotation chain (first 18 migrations):
     `:45956 → :45110 → :53222 → :58366 → :56372 → :55436 → :55188 → :52926 → 
:50682 → :48682 → :47626 → :44014 → :43116 → :45044 → :48602 → :51880 → :50992 
→ :51672`
   - No regression in existing `dubbo-remoting-netty4` and `dubbo-rpc-triple` 
unit tests.
   
   ## Does this pull request potentially affect one of the following parts
   
   - [ ] Dependencies (does it add or upgrade a dependency)
   - [ ] The public API
   - [x] The runtime per-connection behavior (HTTP/2 GOAWAY handling on the 
Consumer side)
   - [ ] The persistence format of the configurations
   - [ ] The default values of configurations
   - [ ] The threading model (uses existing `connectivityExecutor`; does NOT 
introduce new threads)
   - [ ] The serialization protocol
   - [ ] The compatibility with previous versions
   
   Backward-compatible: defaults are unchanged; the only observable difference 
is that GOAWAY frames no longer cause a request failure window.
   
   ## Documentation
   
   - Does this pull request introduce a new feature? **No** (bug fix)
   - If yes, how is the feature documented? **Not applicable**
   
   Happy to backport to `3.2` once this lands on `3.3`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to