yinzixin opened a new pull request, #63409:
URL: https://github.com/apache/doris/pull/63409
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Doris currently cannot read from or write to **Amazon S3 Express One Zone**
buckets. Two distinct gaps block this:
1. **AWS SDK is too old / wrong client builder.** The bundled SDK is
`aws-sdk-cpp 1.11.219`, which predates S3 Express support. The S3 client is
also constructed with the legacy `Aws::Client::ClientConfiguration` path, which
bypasses the SDK's endpoint-rules resolver — so even on a newer SDK, the
`--x-s3` bucket suffix would not trigger `CreateSession` or the `s3express`
SigV4 service name.
2. **`Content-MD5` is unconditionally attached to every PutObject and
UploadPart.** S3 Express One Zone rejects `Content-MD5` with `501
NotImplemented` and instead requires a flexible checksum (CRC32 / CRC32C / SHA1
/ SHA256).
This PR fixes both:
**Commit 1 — `feat: add support for CRC32C checksum in S3 uploads and
disable Content-MD5 for S3 Express`**
- Adds a new BE config `s3_disable_content_md5` (default `false`) and
auto-enables it when the endpoint string contains `s3express`.
- When disabled, `PutObject` / `UploadPart` send `x-amz-checksum-crc32c`
(computed via the existing `crc32c` thirdparty lib) instead of `Content-MD5`.
Behavior for non-S3-Express endpoints is unchanged.
- Threads the endpoint string into `S3ObjStorageClient` so the per-client
flag can be set at construction time.
**Commit 2 — `Upgrade AWS SDK and use new create client to support S3
express`**
- Bumps `aws-sdk-cpp` from `1.11.219` to `1.11.400` in `thirdparty/vars.sh`
(with refreshed MD5 / URL).
- Switches `S3ClientFactory::_create_s3_client` to
`Aws::S3::S3ClientConfiguration` + `S3EndpointProvider`, which activates the
endpoint-rules resolver required for S3 Express (`CreateSession`, `s3express`
SigV4 service name).
- Detects S3 Express buckets by the `--x-s3` suffix or `s3express` endpoint
substring; for those buckets we skip `endpointOverride` (so the SDK resolves
the bucket-specific endpoint) and clear `disableS3ExpressAuth`. For all other
buckets the behavior is unchanged.
- Adds the `vaLog` override in `DorisAWSLogger` that 1.11.400's
`LogSystemInterface` now requires.
### Release note
Support reading from / writing to Amazon S3 Express One Zone buckets. A new
BE config `s3_disable_content_md5` is added to send a CRC32C checksum instead
of `Content-MD5` on uploads; it is auto-enabled for endpoints whose hostname
contains `s3express`.
### Check List (For Author)
- Test
- [x] Manual test (add detailed scripts or steps below)
- Verified PutObject / multipart upload against an `--x-s3` (S3
Express One Zone) bucket in `us-east-1`: `CreateSession` is issued, requests
are signed as `s3express`, and uploads succeed with CRC32C in place of
`Content-MD5`.
- Verified regression-free behavior against a regular S3 bucket
(with and without `endpointOverride`) and against an S3-compatible MinIO
endpoint — `Content-MD5` is still sent and uploads succeed.
- Behavior changed:
- [x] Yes.
- `s3_disable_content_md5=true` (or any endpoint matching
`s3express`) replaces `Content-MD5` with CRC32C on `PutObject` / `UploadPart`.
Default is unchanged for non-S3-Express endpoints.
- AWS SDK upgraded from 1.11.219 → 1.11.400; the BE links a rebuilt
thirdparty bundle.
- Does this need documentation?
- [ ] Yes — a short note for the new `s3_disable_content_md5` BE config
and an S3 Express One Zone usage example. Will open the doris-website PR once
this is reviewed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]