Pierre Villard created NIFI-15583:
-------------------------------------
Summary: S3 Processors use global endpoint instead of regional
endpoint for us-east-1
Key: NIFI-15583
URL: https://issues.apache.org/jira/browse/NIFI-15583
Project: Apache NiFi
Issue Type: Bug
Components: Extensions
Reporter: Pierre Villard
Assignee: Pierre Villard
h2. Problem
S3 processors fail to connect to S3 buckets in the us-east-1 region in
environments that require the regional endpoint
({{{}s3.us-east-1.amazonaws.com{}}}) instead of the global endpoint
({{{}s3.amazonaws.com{}}}). This is the case in environments with outbound
PrivateLink, where network rules are configured to allow traffic to
{{.s3.us-east-1.amazonaws.com}} but not to the global endpoint.
The error observed is:
{noformat}
<bucket-name>.s3.amazonaws.com: Name or service not known
{noformat}
h2. Root Cause
This is caused by a bug in AWS SDK for Java v2 (confirmed in 2.41.26) where
{{DefaultsMode.STANDARD}} does not correctly configure regional S3 endpoints
for us-east-1.
NiFi correctly sets {{DefaultsMode.STANDARD}} on the {{S3Client}} builder in
{{{}AbstractS3Processor.createClientBuilder(){}}}. Per the [AWS
documentation|https://docs.aws.amazon.com/sdkref/latest/guide/setting-global-aws_defaults_mode.html],
{{STANDARD}} mode should configure the SDK to use the regional S3 endpoint for
us-east-1 ({{{}s3.us-east-1.amazonaws.com{}}}) instead of the legacy global
endpoint ({{{}s3.amazonaws.com{}}}).
However, the SDK has a timing bug in its internal initialization sequence:
During client construction,
{{DefaultS3BaseClientBuilder.finalizeServiceConfiguration()}} creates a
{{UseGlobalEndpointResolver}} and stores its result in the client configuration
as {{{}AwsClientOption.USE_GLOBAL_ENDPOINT{}}}.
{{UseGlobalEndpointResolver}} checks, in order: (a) the environment variable /
system property {{{}AWS_S3_US_EAST_1_REGIONAL_ENDPOINT{}}}, (b) the AWS profile
configuration, and (c) the {{DEFAULT_S3_US_EAST_1_REGIONAL_ENDPOINT}} value
from the defaults mode configuration.
The {{DEFAULT_S3_US_EAST_1_REGIONAL_ENDPOINT}} value (which
{{DefaultsMode.STANDARD}} correctly maps to {{{}"regional"{}}}) is only
populated later during
{{{}AwsDefaultClientBuilder.finalizeAwsConfiguration(){}}}, which runs after
{{{}finalizeServiceConfiguration(){}}}.
As a result, {{UseGlobalEndpointResolver}} always reads {{null}} for the
defaults mode value and falls back to using the global endpoint.
At request time, {{S3ResolveEndpointInterceptor}} reads the (incorrectly
resolved) {{USE_GLOBAL_ENDPOINT}} attribute and passes it to the
{{S3EndpointProvider}} via {{{}S3EndpointParams.useGlobalEndpoint(true){}}},
which causes the SDK to generate URLs with the global endpoint.
h2. Fix
The fix wraps the default {{S3EndpointProvider}} with a provider that overrides
{{S3EndpointParams.useGlobalEndpoint()}} to {{false}} before delegating to the
default provider. This ensures regional endpoints are always used, which is
consistent with the behavior that {{DefaultsMode.STANDARD}} is supposed to
provide.
This approach:
* Uses entirely public SDK API ({{{}S3EndpointProvider{}}},
{{{}S3EndpointParams.toBuilder(){}}})
* Does not modify global JVM state (no {{{}System.setProperty{}}})
* Is scoped to NiFi's S3 clients only
* Works for both the regular {{S3Client}} and {{S3EncryptionClient}} code paths
* Is safe for all regions — {{useGlobalEndpoint}} is only relevant for
us-east-1; for other regions, the value has no effect
--
This message was sent by Atlassian Jira
(v8.20.10#820010)