pw24 opened a new issue, #8190:
URL: https://github.com/apache/iceberg/issues/8190
### Apache Iceberg version
1.3.1 (latest release)
### Query engine
Hive
### Please describe the bug 🐞
Hi, I've gone back and forth for a couple of days now between slack and the
iceberg documentation (which really didn't provide any clarity on the matter)
where I'm not able to access iceberg tables located on AWS S3. Meta commands
(list databases and -tables with uri set to the hive meta store service) work,
however when I try to access any table data from S3, I receive `AWS S3 403
Forbidden exceptions`.
I can confirm that my S3 credentials are valid, as I'm able to list a given
bucket's objects with the boto3 python package (so it isn't credential
related). Meaning that the issue is setup connection config, or api bug
related. My config setup is as follows:
- jar dependencies (maven project pom.xml):
```xml
<dependencies>
<dependency>
<groupId>org.apache.iceberg</groupId>
<artifactId>iceberg-hive-runtime</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.3.2</version>
</dependency>
<!-- Parsing dependency on hive meta store commands-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapred</artifactId>
<version>0.22.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>3.1.3</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bundle</artifactId>
<version>1.11.1026</version>
</dependency>
</dependencies>
```
- Table access test logic with Iceberg Java API - 1.3.1:
```java
import org.apache.iceberg.CatalogProperties;
import org.apache.iceberg.aws.AwsProperties;
import org.apache.iceberg.aws.s3.S3FileIOProperties;
Map<String,String> properties = new HashMap<>();
this.config = config;
properties.put(CatalogProperties.URI, config.Uri);
properties.put(CatalogProperties.CACHE_ENABLED, "false");
properties.put(CatalogProperties.WAREHOUSE_LOCATION,
config.DatalakeLocation);
properties.put(CatalogProperties.CLIENT_POOL_SIZE, "2");
properties.put(S3FileIOProperties.SESSION_TOKEN, "<s3 jwt>");
//tested with and without the following params
properties.put(S3FileIOProperties.USE_ARN_REGION_ENABLED, "true");
properties.put(AwsProperties.CLIENT_ASSUME_ROLE_ARN,
"arn:aws:iam::<content>");
properties.put(AwsProperties.CLIENT_ASSUME_ROLE_REGION, "<region>");
catalog = new HiveCatalog();
catalog.initialize("iceberg", properties);
//test logic for accessing table on s3
TableIdentifier ti = TableIdentifier.of(namespace, tableName);
Table table = this.catalog.loadTable(ti);
String location = table.location();
```
- Set Environment variables:

- Exception stack trace:
```bash
Exception in thread "main" org.apache.iceberg.exceptions.RuntimeIOException:
Failed to open input stream for file:
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:187)
at
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:272)
at
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266)
at
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:189)
at
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:208)
at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
at
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:208)
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:185)
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:180)
at
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:178)
at
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
at
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
at
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
at iceclient.IceClient.TablePath(IceClient.java:133)
at iceclient.App.main(App.java:45)
Caused by: java.nio.file.AccessDeniedException:
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json:
getFileStatus on
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: AARQJMST2JNV6QR0;
S3 Extended Request ID:
9fUy3vlplqMUx5ozBtvb8QEl9W4Wzp18DN5vdbcAT83EsrqMvieegEGlacOJn/wnJh0kj8vQG3Y=;
Proxy: null), S3 Extended Request ID:
9fUy3vlplqMUx5ozBtvb8QEl9W4Wzp18DN5vdbcAT83EsrqMvieegEGlacOJn/wnJh0kj8vQG3Y=:403
Forbidden
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:255)
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3796)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
... 17 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden
(Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
AARQJMST2JNV6QR0; S3 Extended Request ID:
9fUy3vlplqMUx5ozBtvb8QEl9W4Wzp18DN5vdbcAT83EsrqMvieegEGlacOJn/wnJh0kj8vQG3Y=;
Proxy: null), S3 Extended Request ID:
9fUy3vlplqMUx5ozBtvb8QEl9W4Wzp18DN5vdbcAT83EsrqMvieegEGlacOJn/wnJh0kj8vQG3Y=
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$10(S3AFileSystem.java:2545)
at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2533)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2513)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3776)
... 23 more
```
The environment variables also have something to do with it, because if I
just test this on a machine where **no AWS environment variables** are set, the
exception stack trace changes to `no auth exceptions`:
```bash
Exception in thread "main" org.apache.iceberg.exceptions.RuntimeIOException:
Failed to open input stream for file:
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:187)
at
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:272)
at
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266)
at
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:189)
at
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:208)
at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
at
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:208)
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:185)
at
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:180)
at
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:178)
at
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
at
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
at
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
at iceclient.IceClient.TablePath(IceClient.java:139)
at iceclient.App.main(App.java:45)
Caused by: java.nio.file.AccessDeniedException:
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json:
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider :
com.amazonaws.SdkClientException: Unable to load AWS credentials from
environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY
(or AWS_SECRET_ACCESS_KEY))
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:212)
at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3799)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
at
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
... 17 more
Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS
Credentials provided by TemporaryAWSCredentialsProvider
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider
IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to
load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or
AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:216)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1257)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:833)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:783)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
at
com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6408)
at
com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6381)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5422)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$10(S3AFileSystem.java:2545)
at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
at
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2533)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2513)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3776)
... 23 more
Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials
from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and
AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at
com.amazonaws.auth.EnvironmentVariableCredentialsProvider.getCredentials(EnvironmentVariableCredentialsProvider.java:50)
at
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
... 44 more
```
Please provide assistance / guidance, and not just giving me a one-liner
response, or linking me to the Iceberg documentation... If the answer is there,
I'm not getting it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]