bingquanzhao opened a new pull request, #63181:
URL: https://github.com/apache/doris/pull/63181
Replace HttpClient5 async with HttpClient4 sync to fix
CircularRedirectException (1.2.0 -> 1.2.1)
The logstash-output-doris plugin uses Apache HttpClient5 async client to PUT
stream load requests. Against SelectDB Cloud / BYOC FE — which returns '307 +
Connection: close' on stream load — the async client fails with
CircularRedirectException under any meaningful concurrency / body size.
Root cause:
1. HC5 async does not strictly block body transmission while waiting for
'100 Continue'. When FE returns 307 before issuing 100, the entity producer has
already started writing; FE closing the connection then yields an IOException
mid-transfer.
2. HC5 default exec chain wraps RedirectExec around
AsyncHttpRequestRetryExec. The recoverable IOException triggers an internal
retry that re-enters the same FE -> 307 path, but RedirectLocations from the
first attempt is still populated, so the same BE URL is detected as 'already
visited' and reported as a circular redirect.
This is a real HC5-vs-HC4 implementation difference, not a configuration
issue. The Doris Flink connector also follows FE 307 to BE in its default path
(autoRedirect=true) and works correctly precisely because it uses HC4 sync: HC4
honors 'Expect: 100-continue' strictly, so when FE 307s without sending 100,
the entity is left unconsumed and HC4's RedirectExec follows the redirect
normally.
This patch aligns the plugin with the Flink connector's HTTP layer:
- bump gem version 1.2.0 -> 1.2.1
- httpclient5 5.4.2 (async) -> httpclient 4.5.13 (sync)
- SimpleRequestBuilder -> HttpPut + ByteArrayEntity (repeatable)
- HttpAsyncClients defaults -> HttpClients with:
* setRequestExecutor(HttpRequestExecutor(60s))
* setRedirectStrategy(DorisRedirectStrategy) (isRedirectable=true,
strip userinfo, normalize empty query)
* setRetryHandler(DefaultHttpRequestRetryHandler(0, false))
* setConnectionReuseStrategy(NoConnectionReuseStrategy.INSTANCE)
* RequestConfig.setExpectContinueEnabled(true)
- Async future plumbing in TableEvents replaced with sync
response_code / response_body / response_error fields.
- Stringify both key and value at request.addHeader call site: HC4's
addHeader(String, String) is strict on types whereas HC5 had a
permissive (String, Object) overload; user configs commonly carry
Float / Integer values like 'max_filter_ratio => 1.0'.
- Drop 's.requirements << jar ...' from gemspec: with JARs vendored under
lib/, the maven lookup at install time is unnecessary and forced users
to set JARS_SKIP=true for offline installs.
Pipeline configuration, retry queue, save_on_failure, group_commit, label
generation, header handling - all unchanged.
Verified on a SelectDB BYOC cluster mirroring the reported production shape
(16 workers x 10000 batch x 200,000 events):
- Before: 100% requests fail with CircularRedirectException
- After: 20/20 stream loads Status=Success, 200,000/200,000 rows
ingested, 0 HTTP-layer errors.
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]