Copilot commented on code in PR #60248:
URL: https://github.com/apache/doris/pull/60248#discussion_r2729008328
##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -148,20 +170,52 @@ private static AwsCredentialsProvider
getAwsCredencialsProviderV1(URI endpoint,
}
if (!Strings.isNullOrEmpty(roleArn)) {
- StsClient stsClient = StsClient.builder()
- .credentialsProvider(DefaultCredentialsProvider.create())
- .build();
-
- return StsAssumeRoleCredentialsProvider.builder()
- .stsClient(stsClient)
- .refreshRequest(builder -> {
-
builder.roleArn(roleArn).roleSessionName("aws-sdk-java-v2-fe");
- if (!Strings.isNullOrEmpty(externalId)) {
- builder.externalId(externalId);
- }
- }).build();
+ // Cache role-based credentials by roleArn:externalId
+ String cacheKey = "v1:" + roleArn + ":" + (externalId != null ?
externalId : "");
+ return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+ StsClient stsClient = StsClient.builder()
+
.credentialsProvider(getCachedDefaultCredentialsProvider())
+ .build();
Review Comment:
The StsClient instances created within the cached credential providers are
never explicitly closed, potentially leading to resource leaks. The StsClient
implements AutoCloseable and should be closed to release its underlying HTTP
client resources. Consider managing the lifecycle of StsClient instances
properly, either by caching them separately or ensuring they are closed when
the credential provider is no longer needed.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/common/AwsCredentialsProviderFactory.java:
##########
@@ -33,19 +33,38 @@
import java.util.ArrayList;
import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
public final class AwsCredentialsProviderFactory {
private AwsCredentialsProviderFactory() {
}
+ // Cache for AWS SDK V2 credential providers to prevent thread leak
+ // Each credential provider type creates internal ScheduledExecutorService
threads
+ // for credential refresh. Without caching, every call to createV2()
creates new
+ // threads that are never cleaned up, causing memory exhaustion.
+ // Key format: "mode:includeAnonymous" for DEFAULT mode, "mode" for others
+ private static final ConcurrentMap<String, AwsCredentialsProvider>
V2_PROVIDER_CACHE =
+ new ConcurrentHashMap<>();
Review Comment:
The credential provider caches use ConcurrentHashMap with no eviction
policy, which means cached providers will remain in memory indefinitely. This
could lead to memory accumulation over time if many different credential
configurations are used throughout the application lifecycle. Consider
implementing a cache eviction strategy using a bounded cache implementation
like Caffeine or Guava's Cache with time-based or size-based eviction.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/S3Properties.java:
##########
@@ -334,20 +362,25 @@ private AwsCredentialsProvider
getAwsCredentialsProviderV2() {
return credentialsProvider;
}
if (StringUtils.isNotBlank(s3IAMRole)) {
- StsClient stsClient = StsClient.builder()
- .region(Region.of(region))
-
.credentialsProvider(AwsCredentialsProviderFactory.createV2(
- awsCredentialsProviderMode,
- false))
- .build();
- return StsAssumeRoleCredentialsProvider.builder()
- .stsClient(stsClient)
- .refreshRequest(builder -> {
-
builder.roleArn(s3IAMRole).roleSessionName("aws-sdk-java-v2-fe");
- if (StringUtils.isNotBlank(s3ExternalId)) {
- builder.externalId(s3ExternalId);
- }
- }).build();
+ // Cache role-based credentials by region:roleArn:externalId:mode
+ String cacheKey = "v2:" + region + ":" + s3IAMRole + ":"
+ + (s3ExternalId != null ? s3ExternalId : "") + ":" +
awsCredentialsProviderMode;
+ return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+ StsClient stsClient = StsClient.builder()
+ .region(Region.of(region))
+
.credentialsProvider(AwsCredentialsProviderFactory.createV2(
+ awsCredentialsProviderMode,
+ false))
+ .build();
Review Comment:
The StsClient instances created within the cached credential providers are
never explicitly closed, potentially leading to resource leaks. The StsClient
implements AutoCloseable and should be closed to release its underlying HTTP
client resources. Consider managing the lifecycle of StsClient instances
properly, either by caching them separately or ensuring they are closed when
the credential provider is no longer needed.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/S3Properties.java:
##########
@@ -50,13 +50,36 @@
import java.util.Objects;
import java.util.Optional;
import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class S3Properties extends AbstractS3CompatibleProperties {
private static final Logger LOG = LogManager.getLogger(S3Properties.class);
+ // Cached credential providers to prevent thread leak.
+ // Each credential provider creates internal ScheduledExecutorService
threads for credential refresh.
+ // Without caching, every call creates new threads that are never cleaned
up.
+ private static volatile InstanceProfileCredentialsProvider
cachedInstanceProfileProvider = null;
+ private static final Object LOCK = new Object();
+
+ // Cache for role-based credential providers, keyed by
region:roleArn:externalId:mode
+ private static final ConcurrentMap<String, AwsCredentialsProvider>
ROLE_CREDENTIALS_CACHE =
+ new ConcurrentHashMap<>();
Review Comment:
The credential provider caches use ConcurrentHashMap with no eviction
policy, which means cached providers will remain in memory indefinitely. This
could lead to memory accumulation over time if many different credential
configurations are used throughout the application lifecycle. Consider
implementing a cache eviction strategy using a bounded cache implementation
like Caffeine or Guava's Cache with time-based or size-based eviction.
##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -58,12 +58,26 @@
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class S3Util {
private static final Logger LOG = LogManager.getLogger(Util.class);
+ // Cached credential providers to prevent thread leak
+ // Each credential provider creates internal ScheduledExecutorService
threads for credential refresh.
+ // Without caching, every call creates new threads that are never cleaned
up.
+ private static volatile AwsCredentialsProvider cachedDefaultChainProvider
= null;
+ private static volatile DefaultCredentialsProvider
cachedDefaultCredentialsProvider = null;
+ private static volatile AwsCredentialsProvider cachedV2ChainProvider =
null;
+ private static final Object LOCK = new Object();
+
+ // Cache for role-based credential providers, keyed by roleArn:externalId
+ private static final ConcurrentMap<String, AwsCredentialsProvider>
ROLE_CREDENTIALS_CACHE =
+ new ConcurrentHashMap<>();
Review Comment:
The credential provider caches use ConcurrentHashMap with no eviction
policy, which means cached providers will remain in memory indefinitely. This
could lead to memory accumulation over time if many different credential
configurations are used throughout the application lifecycle. Consider
implementing a cache eviction strategy using a bounded cache implementation
like Caffeine or Guava's Cache with time-based or size-based eviction.
##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -148,20 +170,52 @@ private static AwsCredentialsProvider
getAwsCredencialsProviderV1(URI endpoint,
}
if (!Strings.isNullOrEmpty(roleArn)) {
- StsClient stsClient = StsClient.builder()
- .credentialsProvider(DefaultCredentialsProvider.create())
- .build();
-
- return StsAssumeRoleCredentialsProvider.builder()
- .stsClient(stsClient)
- .refreshRequest(builder -> {
-
builder.roleArn(roleArn).roleSessionName("aws-sdk-java-v2-fe");
- if (!Strings.isNullOrEmpty(externalId)) {
- builder.externalId(externalId);
- }
- }).build();
+ // Cache role-based credentials by roleArn:externalId
+ String cacheKey = "v1:" + roleArn + ":" + (externalId != null ?
externalId : "");
Review Comment:
The cache key for role-based credentials is missing the region parameter,
which could cause credentials from different regions to be incorrectly shared.
The cache key should include the region to prevent credential providers
configured for different AWS regions from sharing the same cached instance.
This could lead to authentication failures when the same roleArn is used across
multiple regions.
```suggestion
// Cache role-based credentials by region:roleArn:externalId
String cacheKey = "v1:" + (region != null ? region : "") + ":" +
roleArn + ":"
+ (externalId != null ? externalId : "");
```
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/S3Properties.java:
##########
@@ -310,19 +333,24 @@ private AwsCredentialsProvider
getAwsCredentialsProviderV1() {
return credentialsProvider;
}
if (StringUtils.isNotBlank(s3IAMRole)) {
- StsClient stsClient = StsClient.builder()
- .region(Region.of(region))
-
.credentialsProvider(InstanceProfileCredentialsProvider.create())
- .build();
-
- return StsAssumeRoleCredentialsProvider.builder()
- .stsClient(stsClient)
- .refreshRequest(builder -> {
-
builder.roleArn(s3IAMRole).roleSessionName("aws-sdk-java-v2-fe");
- if (StringUtils.isNotBlank(s3ExternalId)) {
- builder.externalId(s3ExternalId);
- }
- }).build();
+ // Cache role-based credentials by region:roleArn:externalId:mode
+ String cacheKey = "v1:" + region + ":" + s3IAMRole + ":"
+ + (s3ExternalId != null ? s3ExternalId : "");
+ return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+ StsClient stsClient = StsClient.builder()
+ .region(Region.of(region))
+
.credentialsProvider(getCachedInstanceProfileProvider())
+ .build();
+
+ return StsAssumeRoleCredentialsProvider.builder()
+ .stsClient(stsClient)
+ .refreshRequest(builder -> {
+
builder.roleArn(s3IAMRole).roleSessionName("aws-sdk-java-v2-fe");
+ if (StringUtils.isNotBlank(s3ExternalId)) {
+ builder.externalId(s3ExternalId);
+ }
+ }).build();
+ });
}
// For anonymous access (no credentials required)
return AnonymousCredentialsProvider.create();
Review Comment:
AnonymousCredentialsProvider.create() should also be cached since it creates
an instance that could potentially be reused. While anonymous credentials don't
require refresh threads like other providers, caching would be more consistent
with the overall pattern and could prevent unnecessary object creation.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/metastore/AWSGlueMetaStoreBaseProperties.java:
##########
@@ -176,36 +204,29 @@ public AwsCredentialsProvider getAwsCredentialsProvider()
{
glueSessionToken));
}
}
- // If IAM role is configured, use STS AssumeRole
+ // If IAM role is configured, use STS AssumeRole with cached providers
if (StringUtils.isNotBlank(glueIAMRole)) {
- StsClient stsClient = StsClient.builder()
- .region(Region.of(glueRegion))
- .credentialsProvider(AwsCredentialsProviderChain.of(
- WebIdentityTokenFileCredentialsProvider.create(),
- ContainerCredentialsProvider.create(),
- InstanceProfileCredentialsProvider.create(),
- SystemPropertyCredentialsProvider.create(),
- EnvironmentVariableCredentialsProvider.create(),
- ProfileCredentialsProvider.create()))
- .build();
-
- return StsAssumeRoleCredentialsProvider.builder()
- .stsClient(stsClient)
- .refreshRequest(builder -> {
-
builder.roleArn(glueIAMRole).roleSessionName("aws-glue-java-fe");
- if (StringUtils.isNotBlank(glueExternalId)) {
- builder.externalId(glueExternalId);
- }
- }).build();
+ // Cache role-based credentials by region:roleArn:externalId
+ String cacheKey = glueRegion + ":" + glueIAMRole + ":"
+ + (glueExternalId != null ? glueExternalId : "");
+ return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+ StsClient stsClient = StsClient.builder()
+ .region(Region.of(glueRegion))
+ .credentialsProvider(getCachedDefaultChainProvider())
+ .build();
Review Comment:
The StsClient instances created within the cached credential providers are
never explicitly closed, potentially leading to resource leaks. The StsClient
implements AutoCloseable and should be closed to release its underlying HTTP
client resources. Consider managing the lifecycle of StsClient instances
properly, either by caching them separately or ensuring they are closed when
the credential provider is no longer needed.
##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -177,32 +231,24 @@ private static AwsCredentialsProvider
getAwsCredencialsProviderV2(URI endpoint,
}
if (!Strings.isNullOrEmpty(roleArn)) {
- StsClient stsClient = StsClient.builder()
- .credentialsProvider(AwsCredentialsProviderChain.of(
- WebIdentityTokenFileCredentialsProvider.create(),
- ContainerCredentialsProvider.create(),
- InstanceProfileCredentialsProvider.create(),
- SystemPropertyCredentialsProvider.create(),
- EnvironmentVariableCredentialsProvider.create(),
- ProfileCredentialsProvider.create()))
- .build();
-
- return StsAssumeRoleCredentialsProvider.builder()
- .stsClient(stsClient)
- .refreshRequest(builder -> {
-
builder.roleArn(roleArn).roleSessionName("aws-sdk-java-v2-fe");
- if (!Strings.isNullOrEmpty(externalId)) {
- builder.externalId(externalId);
- }
- }).build();
+ // Cache role-based credentials by roleArn:externalId
+ String cacheKey = "v2:" + roleArn + ":" + (externalId != null ?
externalId : "");
+ return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+ StsClient stsClient = StsClient.builder()
+ .credentialsProvider(getCachedV2ChainProvider())
+ .build();
Review Comment:
The StsClient instances created within the cached credential providers are
never explicitly closed, potentially leading to resource leaks. The StsClient
implements AutoCloseable and should be closed to release its underlying HTTP
client resources. Consider managing the lifecycle of StsClient instances
properly, either by caching them separately or ensuring they are closed when
the credential provider is no longer needed.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/S3Properties.java:
##########
@@ -310,19 +333,24 @@ private AwsCredentialsProvider
getAwsCredentialsProviderV1() {
return credentialsProvider;
}
if (StringUtils.isNotBlank(s3IAMRole)) {
- StsClient stsClient = StsClient.builder()
- .region(Region.of(region))
-
.credentialsProvider(InstanceProfileCredentialsProvider.create())
- .build();
-
- return StsAssumeRoleCredentialsProvider.builder()
- .stsClient(stsClient)
- .refreshRequest(builder -> {
-
builder.roleArn(s3IAMRole).roleSessionName("aws-sdk-java-v2-fe");
- if (StringUtils.isNotBlank(s3ExternalId)) {
- builder.externalId(s3ExternalId);
- }
- }).build();
+ // Cache role-based credentials by region:roleArn:externalId:mode
+ String cacheKey = "v1:" + region + ":" + s3IAMRole + ":"
+ + (s3ExternalId != null ? s3ExternalId : "");
+ return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+ StsClient stsClient = StsClient.builder()
+ .region(Region.of(region))
+
.credentialsProvider(getCachedInstanceProfileProvider())
+ .build();
Review Comment:
The StsClient instances created within the cached credential providers are
never explicitly closed, potentially leading to resource leaks. The StsClient
implements AutoCloseable and should be closed to release its underlying HTTP
client resources. Consider managing the lifecycle of StsClient instances
properly, either by caching them separately or ensuring they are closed when
the credential provider is no longer needed.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/property/metastore/AWSGlueMetaStoreBaseProperties.java:
##########
@@ -40,10 +40,38 @@
import
software.amazon.awssdk.services.sts.auth.StsAssumeRoleCredentialsProvider;
import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class AWSGlueMetaStoreBaseProperties {
+ // Cached credential providers to prevent thread leak.
+ // Each credential provider creates internal ScheduledExecutorService
threads for credential refresh.
+ // Without caching, every call creates new threads that are never cleaned
up.
+ private static volatile AwsCredentialsProvider cachedDefaultChainProvider
= null;
+ private static final Object LOCK = new Object();
+
+ // Cache for role-based credential providers, keyed by
region:roleArn:externalId
+ private static final ConcurrentMap<String, AwsCredentialsProvider>
ROLE_CREDENTIALS_CACHE =
+ new ConcurrentHashMap<>();
Review Comment:
The credential provider caches use ConcurrentHashMap with no eviction
policy, which means cached providers will remain in memory indefinitely. This
could lead to memory accumulation over time if many different credential
configurations are used throughout the application lifecycle. Consider
implementing a cache eviction strategy using a bounded cache implementation
like Caffeine or Guava's Cache with time-based or size-based eviction.
##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -177,32 +231,24 @@ private static AwsCredentialsProvider
getAwsCredencialsProviderV2(URI endpoint,
}
if (!Strings.isNullOrEmpty(roleArn)) {
- StsClient stsClient = StsClient.builder()
- .credentialsProvider(AwsCredentialsProviderChain.of(
- WebIdentityTokenFileCredentialsProvider.create(),
- ContainerCredentialsProvider.create(),
- InstanceProfileCredentialsProvider.create(),
- SystemPropertyCredentialsProvider.create(),
- EnvironmentVariableCredentialsProvider.create(),
- ProfileCredentialsProvider.create()))
- .build();
-
- return StsAssumeRoleCredentialsProvider.builder()
- .stsClient(stsClient)
- .refreshRequest(builder -> {
-
builder.roleArn(roleArn).roleSessionName("aws-sdk-java-v2-fe");
- if (!Strings.isNullOrEmpty(externalId)) {
- builder.externalId(externalId);
- }
- }).build();
+ // Cache role-based credentials by roleArn:externalId
+ String cacheKey = "v2:" + roleArn + ":" + (externalId != null ?
externalId : "");
+ return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
+ StsClient stsClient = StsClient.builder()
+ .credentialsProvider(getCachedV2ChainProvider())
+ .build();
Review Comment:
The cache key for role-based credentials is missing the region parameter,
which could cause credentials from different regions to be incorrectly shared.
The cache key should include the region to prevent credential providers
configured for different AWS regions from sharing the same cached instance.
This could lead to authentication failures when the same roleArn is used across
multiple regions.
```suggestion
// Cache role-based credentials by roleArn:externalId:region
String cacheKey = "v2:" + roleArn + ":" + (externalId != null ?
externalId : "")
+ ":" + (region != null ? region : "");
return ROLE_CREDENTIALS_CACHE.computeIfAbsent(cacheKey, k -> {
StsClient.Builder stsBuilder = StsClient.builder()
.credentialsProvider(getCachedV2ChainProvider());
if (!Strings.isNullOrEmpty(region)) {
stsBuilder = stsBuilder.region(Region.of(region));
}
StsClient stsClient = stsBuilder.build();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]