pw24 opened a new issue, #8190:
URL: https://github.com/apache/iceberg/issues/8190

   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   Hive
   
   ### Please describe the bug 🐞
   
   Hi, I've gone back and forth for a couple of days now between slack and the 
iceberg documentation (which really didn't provide any clarity on the matter) 
where I'm not able to access iceberg tables located on AWS S3. Meta commands 
(list databases and -tables with uri set to the hive meta store service) work, 
however when I try to access any table data from S3, I receive `AWS S3 403 
Forbidden exceptions`.
   
   I can confirm that my S3 credentials are valid, as I'm able to list a given 
bucket's objects with the boto3 python package (so it isn't credential 
related). Meaning that the issue is setup connection config, or api bug 
related. My config setup is as follows:
   
   - jar dependencies (maven project pom.xml):
   ```xml
     <dependencies>
       <dependency>
         <groupId>org.apache.iceberg</groupId>
         <artifactId>iceberg-hive-runtime</artifactId>
         <version>1.3.1</version>
       </dependency>
       <dependency>  
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-aws</artifactId>
         <version>3.3.2</version>
       </dependency>
       <dependency> 
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-common</artifactId>
         <version>3.3.2</version>
       </dependency> 
       <!-- Parsing dependency on hive meta store commands-->
       <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-mapred</artifactId>
         <version>0.22.0</version>
       </dependency>
       <dependency>
         <groupId>org.apache.hive</groupId>
         <artifactId>hive-metastore</artifactId>
         <version>3.1.3</version>
       </dependency>
       <dependency>
         <groupId>com.amazonaws</groupId>
         <artifactId>aws-java-sdk-bundle</artifactId>
         <version>1.11.1026</version>
       </dependency> 
     </dependencies>
   ```
   - Table access test logic with Iceberg Java API - 1.3.1:
   ```java
   import org.apache.iceberg.CatalogProperties;
   import org.apache.iceberg.aws.AwsProperties;
   import org.apache.iceberg.aws.s3.S3FileIOProperties;
   
   Map<String,String> properties = new HashMap<>();
   this.config = config;
   properties.put(CatalogProperties.URI, config.Uri); 
   properties.put(CatalogProperties.CACHE_ENABLED, "false");
   properties.put(CatalogProperties.WAREHOUSE_LOCATION, 
config.DatalakeLocation);
   properties.put(CatalogProperties.CLIENT_POOL_SIZE, "2");
   
   properties.put(S3FileIOProperties.SESSION_TOKEN, "<s3 jwt>");
   
   //tested with and without the following params
   properties.put(S3FileIOProperties.USE_ARN_REGION_ENABLED, "true");
   properties.put(AwsProperties.CLIENT_ASSUME_ROLE_ARN, 
"arn:aws:iam::<content>");
   properties.put(AwsProperties.CLIENT_ASSUME_ROLE_REGION, "<region>");
   
   catalog = new HiveCatalog();
   catalog.initialize("iceberg", properties);
   
   //test logic for accessing table on s3
   TableIdentifier ti = TableIdentifier.of(namespace, tableName);
   Table table = this.catalog.loadTable(ti);
   String location = table.location();
   ```
   
   - Set Environment variables:
   
   
![image](https://github.com/apache/iceberg/assets/7777405/019088a3-22c4-411c-b380-6e20ffbf3724)
   
   - Exception stack trace:
   ```bash
   Exception in thread "main" org.apache.iceberg.exceptions.RuntimeIOException: 
Failed to open input stream for file: 
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json
           at 
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:187)
           at 
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:272)
           at 
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:189)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:208)
           at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
           at 
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
           at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
           at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:208)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:185)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:180)
           at 
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:178)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
           at 
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
           at iceclient.IceClient.TablePath(IceClient.java:133)
           at iceclient.App.main(App.java:45)
   Caused by: java.nio.file.AccessDeniedException: 
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json:
 getFileStatus on 
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json:
 com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: AARQJMST2JNV6QR0; 
S3 Extended Request ID: 
9fUy3vlplqMUx5ozBtvb8QEl9W4Wzp18DN5vdbcAT83EsrqMvieegEGlacOJn/wnJh0kj8vQG3Y=; 
Proxy: null), S3 Extended Request ID: 
9fUy3vlplqMUx5ozBtvb8QEl9W4Wzp18DN5vdbcAT83EsrqMvieegEGlacOJn/wnJh0kj8vQG3Y=:403
 Forbidden
           at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:255)
           at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3796)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441)
           at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
           at 
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
           ... 17 more
   Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden 
(Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
AARQJMST2JNV6QR0; S3 Extended Request ID: 
9fUy3vlplqMUx5ozBtvb8QEl9W4Wzp18DN5vdbcAT83EsrqMvieegEGlacOJn/wnJh0kj8vQG3Y=; 
Proxy: null), S3 Extended Request ID: 
9fUy3vlplqMUx5ozBtvb8QEl9W4Wzp18DN5vdbcAT83EsrqMvieegEGlacOJn/wnJh0kj8vQG3Y=
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
           at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
           at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
           at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
           at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
           at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$10(S3AFileSystem.java:2545)
           at 
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
           at 
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2533)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2513)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3776)
           ... 23 more
   ```
   
   The environment variables also have something to do with it, because if I 
just test this on a machine where **no AWS environment variables** are set, the 
exception stack trace changes to `no auth exceptions`:
   
   ```bash
   Exception in thread "main" org.apache.iceberg.exceptions.RuntimeIOException: 
Failed to open input stream for file: 
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json
           at 
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:187)
           at 
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:272)
           at 
org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:189)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:208)
           at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
           at 
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
           at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
           at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:208)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:185)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:180)
           at 
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:178)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
           at 
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
           at 
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
           at iceclient.IceClient.TablePath(IceClient.java:139)
           at iceclient.App.main(App.java:45)
   Caused by: java.nio.file.AccessDeniedException: 
s3a://00000000-0000-0000-0000-000000000070-foundry/datalake/ml/temp_rect_table-e79b4d041ff9456cb5db38642fb4552e/metadata/00121-c8420d85-effe-4d0d-9412-b7efd4633fac.metadata.json:
 org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
com.amazonaws.SdkClientException: Unable to load AWS credentials from 
environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY 
(or AWS_SECRET_ACCESS_KEY))
           at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:212)
           at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3799)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441)
           at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
           at 
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
           ... 17 more
   Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS 
Credentials provided by TemporaryAWSCredentialsProvider 
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to 
load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or 
AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
           at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:216)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1257)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:833)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:783)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
           at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
           at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
           at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
           at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5437)
           at 
com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6408)
           at 
com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6381)
           at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5422)
           at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5384)
           at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1367)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$10(S3AFileSystem.java:2545)
           at 
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
           at 
org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2533)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2513)
           at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3776)
           ... 23 more
   Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials 
from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and 
AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
           at 
com.amazonaws.auth.EnvironmentVariableCredentialsProvider.getCredentials(EnvironmentVariableCredentialsProvider.java:50)
           at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
           ... 44 more
   ```
   
   Please provide assistance / guidance, and not just giving me a one-liner 
response, or linking me to the Iceberg documentation... If the answer is there, 
I'm not getting it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to