wenzhenghu opened a new pull request, #63381:
URL: https://github.com/apache/doris/pull/63381
### What problem does this PR solve?
Issue Number: close #63325
Problem Summary:
**Problem**
- Starting from Doris `3.1.3`, FE uses `Jetty 12`, and this introduced a
compatibility change in the Stream Load redirect path.
- When a Stream Load request is sent to FE, FE may return `307 Temporary
Redirect` before the request body is fully consumed. Under `Jetty 12`, this
behavior is more likely to cause early connection close or reset while the
client is still writing the request body.
- As a result, some `HTTP/1.1` streaming clients may observe errors such as
`BrokenPipeError` or `ConnectionResetError` when sending Stream Load requests
through FE.
- The problem is more visible with chunked uploads, higher network latency,
and clients that continue sending request body data before fully processing the
redirect response.
- In short, this is a compatibility regression introduced by the `Jetty 12`
upgrade in Doris `3.1.3` and later.
**Fix**
- We keep the existing FE-to-BE redirect architecture unchanged, so FE still
redirects Stream Load requests to BE instead of proxying the full request body.
- We add a bounded request-body drain step on the FE Stream Load redirect
path:
- FE first writes the `307 Temporary Redirect` response.
- FE then drains and discards only a bounded amount of the remaining
request body.
- This provides a small compatibility window for in-flight client writes
and reduces the chance of early connection reset.
- We also apply the same handling to token-authenticated Stream Load
requests, so both password-authenticated and token-authenticated paths behave
consistently.
- In addition, we expose Jetty's unconsumed request content read setting
through FE configuration and apply it to HTTP connectors, so operators can tune
Jetty behavior for redirect scenarios where the request body is not fully
consumed.
- To make the compatibility path effective out of the box, this PR also
enables the bounded drain path by default with a `1GB` drain limit and a
`1000ms` idle wait window.
**New Configurations**
- `jetty_server_max_unconsumed_request_content_reads`
- Controls how many extra reads Jetty performs for unconsumed request
content.
- `-1` means unlimited, `0` use Jetty default value, and a positive value
sets the maximum number of read attempts.
- Default value in this PR: `0`.
- This helps tune Jetty behavior after the `Jetty 12` upgrade when FE
returns a response before the request body is fully consumed.
- `stream_load_redirect_bounded_drain_max_bytes`
- Controls the maximum number of request body bytes FE drains after
returning `307` for a Stream Load redirect.
- `0` disables this compatibility logic.
- A positive value enables bounded draining and limits how much data FE
will discard.
- Default value in this PR: `1GB`.
- `stream_load_redirect_bounded_drain_max_idle_time_ms`
- Controls how long FE waits for more readable request body data during
the bounded drain process.
- `0` disables the extra idle wait.
- A positive value provides a small grace window for slow clients or
delayed body chunks, helping absorb in-flight writes without keeping the
connection open indefinitely.
- Default value in this PR: `1000ms`.
**Test Result / Validation**
- Verified the behavior with the same Python `HTTP/1.1` chunked Stream Load
reproduction used during issue analysis.
- Reproduced requests were sent to FE with `Expect: 100-continue`,
`Transfer-Encoding: chunked`, and paced body streaming to maximize the redirect
race window.
- Baseline validation on Doris `3.0` (`9030` / FE `8030`):
- `payload_mb=1`, `chunk_kb=1`, `sleep_ms=0`
- `payload_mb=8`, `chunk_kb=16`, `sleep_ms=10`
- Both requests returned normal `307 Temporary Redirect`.
- Validation on the fixed Doris `3.1.4` instance (`9034` / FE `8034`):
- Before enabling the bounded drain config, the same reproduction still
triggered `BrokenPipeError`.
- After enabling the FE configs below:
- `jetty_server_max_unconsumed_request_content_reads = -1`
- `stream_load_redirect_bounded_drain_max_bytes = 16777216`
- `stream_load_redirect_bounded_drain_max_idle_time_ms = 1000`
- The same two reproduction requests both returned normal `307 Temporary
Redirect`.
- No `BrokenPipeError` or `ConnectionResetError` was observed after the
config took effect.
- The PR now further updates the default bounded drain byte limit from
`16MB` to `1GB`, while keeping the default idle wait at `1000ms`, so the
compatibility path is enabled by default with a more generous drain window.
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [x] Unit Test
- [x] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [x] No.
- [ ] Yes. FE now drains a bounded amount of request body after Stream
Load redirect by default, and the default drain byte limit is set to `1GB`.
- Does this need documentation?
- [ ] No.
- [x] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]