lhotari opened a new pull request, #25919:
URL: https://github.com/apache/pulsar/pull/25919
### Motivation
`AdminProxyHandler` (the Pulsar admin proxy) follows broker HTTP `307`
redirects internally: when the proxy forwards an admin request to a broker that
does not own the topic's bundle, the broker responds with `307` and the proxy
follows the redirect to the owner broker, replaying the request body.
Intermittently, admin requests **that carry a body** (e.g. `PUT
createNonPartitionedTopic`, `POST setDelayedDeliveryPolicy`) fail through the
proxy with **HTTP 502 Bad Gateway**. The failure is timing-dependent and
surfaces as flakiness in `ProxyRedirectTest.testProxyHandlesReplayingContent`,
more frequently on CPU-constrained environments such as CI runners. Requests
**without** a body (`GET`) are unaffected.
Root cause: Jetty's `RedirectProtocolHandler.onSuccess(Response)` aborts the
in-flight request with `HttpRequestException: "Aborting request after receiving
a 307 response"` whenever the request still has a body to send
(`request.getBody() != null`), in order to stop streaming the body:
```java
public void onSuccess(Response response) {
// The request may still be sending content, stop it.
Request request = response.getRequest();
if (request.getBody() != null)
request.abort(new HttpRequestException("Aborting request after
receiving a %d response"
.formatted(response.getStatus()), request));
}
```
When the broker returns the `307` **before the proxy has finished sending
the request body**, that abort races ahead of the redirect continuation
(`onComplete`) and is delivered to the proxy as the request failure.
`AbstractProxyServlet.onProxyResponseFailure` then maps it (a
non-`TimeoutException`) to **HTTP 502 Bad Gateway**. The cause is logged by
Jetty only at `DEBUG`, which is why it is invisible in normal proxy logs.
### Modifications
- Add `NonAbortingRedirectProtocolHandler`, a `RedirectProtocolHandler`
subclass whose `onSuccess` does not abort the in-flight request.
- Register it in `AdminProxyHandler#customizeHttpClient` in place of the
stock `RedirectProtocolHandler`.
The redirect is still driven by `RedirectProtocolHandler#onComplete` from
the response status and `Location` header, so redirects are followed exactly as
before (the body is replayed by the existing `ReplayableProxyContentProvider`);
only the abort that produced the spurious 502 is removed.
### Verifying this change
This change is already covered by existing tests, specifically
`ProxyRedirectTest.testProxyHandlesReplayingContent`, which exercises the proxy
redirect + request-body-replay path.
It was additionally validated against a reliable local reproduction: with
the test's topic loop raised to 5000 and run in a single-CPU container, the 502
reproduced within ~1–3 minutes **before** this change, and passed **5/5**
consecutive runs **after** it (with no regression to a `504`/idle-timeout path).
### Does this pull request potentially affect one of the following parts:
*If the box was checked, please highlight the changes*
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]