yinzixin opened a new pull request, #63409:
URL: https://github.com/apache/doris/pull/63409

   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   Problem Summary:
   
   Doris currently cannot read from or write to **Amazon S3 Express One Zone** 
buckets. Two distinct gaps block this:
   
   1. **AWS SDK is too old / wrong client builder.** The bundled SDK is 
`aws-sdk-cpp 1.11.219`, which predates S3 Express support. The S3 client is 
also constructed with the legacy `Aws::Client::ClientConfiguration` path, which 
bypasses the SDK's endpoint-rules resolver — so even on a newer SDK, the 
`--x-s3` bucket suffix would not trigger `CreateSession` or the `s3express` 
SigV4 service name.
   
   2. **`Content-MD5` is unconditionally attached to every PutObject and 
UploadPart.** S3 Express One Zone rejects `Content-MD5` with `501 
NotImplemented` and instead requires a flexible checksum (CRC32 / CRC32C / SHA1 
/ SHA256).
   
   This PR fixes both:
   
   **Commit 1 — `feat: add support for CRC32C checksum in S3 uploads and 
disable Content-MD5 for S3 Express`**
   
   - Adds a new BE config `s3_disable_content_md5` (default `false`) and 
auto-enables it when the endpoint string contains `s3express`.
   - When disabled, `PutObject` / `UploadPart` send `x-amz-checksum-crc32c` 
(computed via the existing `crc32c` thirdparty lib) instead of `Content-MD5`. 
Behavior for non-S3-Express endpoints is unchanged.
   - Threads the endpoint string into `S3ObjStorageClient` so the per-client 
flag can be set at construction time.
   
   **Commit 2 — `Upgrade AWS SDK and use new create client to support S3 
express`**
   
   - Bumps `aws-sdk-cpp` from `1.11.219` to `1.11.400` in `thirdparty/vars.sh` 
(with refreshed MD5 / URL).
   - Switches `S3ClientFactory::_create_s3_client` to 
`Aws::S3::S3ClientConfiguration` + `S3EndpointProvider`, which activates the 
endpoint-rules resolver required for S3 Express (`CreateSession`, `s3express` 
SigV4 service name).
   - Detects S3 Express buckets by the `--x-s3` suffix or `s3express` endpoint 
substring; for those buckets we skip `endpointOverride` (so the SDK resolves 
the bucket-specific endpoint) and clear `disableS3ExpressAuth`. For all other 
buckets the behavior is unchanged.
   - Adds the `vaLog` override in `DorisAWSLogger` that 1.11.400's 
`LogSystemInterface` now requires.
   
   ### Release note
   
   Support reading from / writing to Amazon S3 Express One Zone buckets. A new 
BE config `s3_disable_content_md5` is added to send a CRC32C checksum instead 
of `Content-MD5` on uploads; it is auto-enabled for endpoints whose hostname 
contains `s3express`.
   
   ### Check List (For Author)
   
   - Test
       - [x] Manual test (add detailed scripts or steps below)
           - Verified PutObject / multipart upload against an `--x-s3` (S3 
Express One Zone) bucket in `us-east-1`: `CreateSession` is issued, requests 
are signed as `s3express`, and uploads succeed with CRC32C in place of 
`Content-MD5`.
           - Verified regression-free behavior against a regular S3 bucket 
(with and without `endpointOverride`) and against an S3-compatible MinIO 
endpoint — `Content-MD5` is still sent and uploads succeed.
   
   - Behavior changed:
       - [x] Yes.
           - `s3_disable_content_md5=true` (or any endpoint matching 
`s3express`) replaces `Content-MD5` with CRC32C on `PutObject` / `UploadPart`. 
Default is unchanged for non-S3-Express endpoints.
           - AWS SDK upgraded from 1.11.219 → 1.11.400; the BE links a rebuilt 
thirdparty bundle.
   
   - Does this need documentation?
       - [ ] Yes — a short note for the new `s3_disable_content_md5` BE config 
and an S3 Express One Zone usage example. Will open the doris-website PR once 
this is reviewed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to