Pierre Villard created NIFI-15583:
-------------------------------------

             Summary: S3 Processors use global endpoint instead of regional 
endpoint for us-east-1
                 Key: NIFI-15583
                 URL: https://issues.apache.org/jira/browse/NIFI-15583
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
            Reporter: Pierre Villard
            Assignee: Pierre Villard


h2. Problem

S3 processors fail to connect to S3 buckets in the us-east-1 region in 
environments that require the regional endpoint 
({{{}s3.us-east-1.amazonaws.com{}}}) instead of the global endpoint 
({{{}s3.amazonaws.com{}}}). This is the case in environments with outbound 
PrivateLink, where network rules are configured to allow traffic to 
{{.s3.us-east-1.amazonaws.com}} but not to the global endpoint.

The error observed is:
{noformat}
<bucket-name>.s3.amazonaws.com: Name or service not known
{noformat}
h2. Root Cause

This is caused by a bug in AWS SDK for Java v2 (confirmed in 2.41.26) where 
{{DefaultsMode.STANDARD}} does not correctly configure regional S3 endpoints 
for us-east-1.

NiFi correctly sets {{DefaultsMode.STANDARD}} on the {{S3Client}} builder in 
{{{}AbstractS3Processor.createClientBuilder(){}}}. Per the [AWS 
documentation|https://docs.aws.amazon.com/sdkref/latest/guide/setting-global-aws_defaults_mode.html],
 {{STANDARD}} mode should configure the SDK to use the regional S3 endpoint for 
us-east-1 ({{{}s3.us-east-1.amazonaws.com{}}}) instead of the legacy global 
endpoint ({{{}s3.amazonaws.com{}}}).

However, the SDK has a timing bug in its internal initialization sequence:
During client construction, 
{{DefaultS3BaseClientBuilder.finalizeServiceConfiguration()}} creates a 
{{UseGlobalEndpointResolver}} and stores its result in the client configuration 
as {{{}AwsClientOption.USE_GLOBAL_ENDPOINT{}}}.

{{UseGlobalEndpointResolver}} checks, in order: (a) the environment variable / 
system property {{{}AWS_S3_US_EAST_1_REGIONAL_ENDPOINT{}}}, (b) the AWS profile 
configuration, and (c) the {{DEFAULT_S3_US_EAST_1_REGIONAL_ENDPOINT}} value 
from the defaults mode configuration.

The {{DEFAULT_S3_US_EAST_1_REGIONAL_ENDPOINT}} value (which 
{{DefaultsMode.STANDARD}} correctly maps to {{{}"regional"{}}}) is only 
populated later during 
{{{}AwsDefaultClientBuilder.finalizeAwsConfiguration(){}}}, which runs after 
{{{}finalizeServiceConfiguration(){}}}.

As a result, {{UseGlobalEndpointResolver}} always reads {{null}} for the 
defaults mode value and falls back to using the global endpoint.

At request time, {{S3ResolveEndpointInterceptor}} reads the (incorrectly 
resolved) {{USE_GLOBAL_ENDPOINT}} attribute and passes it to the 
{{S3EndpointProvider}} via {{{}S3EndpointParams.useGlobalEndpoint(true){}}}, 
which causes the SDK to generate URLs with the global endpoint.
h2. Fix

The fix wraps the default {{S3EndpointProvider}} with a provider that overrides 
{{S3EndpointParams.useGlobalEndpoint()}} to {{false}} before delegating to the 
default provider. This ensures regional endpoints are always used, which is 
consistent with the behavior that {{DefaultsMode.STANDARD}} is supposed to 
provide.

This approach:
 * Uses entirely public SDK API ({{{}S3EndpointProvider{}}}, 
{{{}S3EndpointParams.toBuilder(){}}})
 * Does not modify global JVM state (no {{{}System.setProperty{}}})
 * Is scoped to NiFi's S3 clients only
 * Works for both the regular {{S3Client}} and {{S3EncryptionClient}} code paths
 * Is safe for all regions — {{useGlobalEndpoint}} is only relevant for 
us-east-1; for other regions, the value has no effect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to