vivek807 opened a new issue, #19608:
URL: https://github.com/apache/druid/issues/19608
### Description
Add support for AWS S3 Multi-Region Access Points (MRAPs) and S3 Access
Point ARNs in Druid's S3 extension.
Currently, the bucket field in Druid's S3 configuration only accepts
standard DNS-compliant bucket names. AWS Access Point ARNs (eg.,
`arn:aws:s3::123456789123:accesspoint:bucket.mrap`) are rejected at
construction time in `CloudObjectLocation` because they fail the URL-encoding
equality check used to enforce DNS naming rules. Additionally, some tools
produce ARNs with a slash separator (accesspoint/alias) instead of the
colon-delimited form (accesspoint:alias) expected by the AWS SDK, causing
further failures downstream.
This change:
- Relaxes the bucket name validation in CloudObjectLocation to permit valid
S3 Access Point ARNs alongside DNS-compliant names.
- Adds S3Utils.normalizeBucketName() to canonicalize the slash-delimited
form to the colon-delimited form at ingestion points
(S3DataSegmentPusherConfig, S3LoadSpec).
- Supports both regional Access Point ARNs
(`arn:aws:s3:<region>:<account>:accesspoint:<name>`) and MRAP ARNs
(`arn:aws:s3::<account>:accesspoint:<name>.mrap`).
No API surface changes; the bucket configuration field continues to accept
plain bucket names unchanged.
### Motivation
**Use case**
AWS Multi-Region Access Points provide a single global S3 endpoint that
routes requests to the nearest healthy bucket replica across regions. Operators
use MRAPs for:
- Active-active multi-region Druid deployments backed by S3 Cross-Region
Replication (CRR).
- Disaster recovery setups where deep storage must remain accessible during
a regional outage.
- Simplifying Druid configuration across regions — one ARN in
druid.storage.bucket instead of per-region overrides.
- Access Point ARNs more broadly (single-region) are also used to enforce
fine-grained IAM access controls on shared buckets without exposing the bucket
name.
Why the current behavior blocks this
CloudObjectLocation enforces:
```java
Preconditions.checkArgument(
this.bucket.equals(StringUtils.urlEncode(this.bucket)),
"bucket must follow DNS-compliant naming conventions"
);
```
An ARN like `arn:aws:s3::123456789123:accesspoint:bucket.mrap` URL-encodes
to `arn:aws:s3::123456789123:accesspoint:bucket.mrap`, so the check always
fails. There is no escape hatch. Users who configure an MRAP ARN as the Druid
storage bucket receive an IllegalArgumentException at startup with no
workaround short of patching the code.
The AWS SDK for Java (v1 and v2) accepts ARN strings wherever a bucket name
is expected, so no SDK-level changes are required. The fix is purely a
validation relaxation and a normalization helper.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]