For iceberg tables stored in AWS S3 buckets, knowing the region of the bucket 
is critical for engines using vended credentials (when configured) to access a 
table.

E.g - the vended credentials for AWS look like this 

{ "s3.access-key-id": "ASI....”,
  "s3.secret-access-key": "gbVT9PpFBY...”, 
  "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”, 
   "expiration-time": “1725572949000” }

An engine consuming this, would need to either infer (s3api 
get-bucket-location) the region or ask the end user to provide the region 
separately which misses the point of vended credentials. 

A engine engine cannot use get-bucket-location, because the credential 
generation explicitly allows only s3:GetObject, s3:GetObjectVersion, 
s3:PutObject, s3:DeletObject, s3:ListBucket for the table location prefix. 
Refer - 
org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString

I propose that 

- the storage setup for S3 should have parameter for the bucket region 
(org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo) 
- if the parameter is not specified, then Polaris attempts to look up 
(get-bucket-location) the region. 
- the information is returned in vended credentials (if enabled) as 
"s3.region”:…

Note - another option could be to allow ’s3:GetBucketLocation’ in the 
policyString when generating vended credentials’ iam role, but that is sub 
optimal and therefore I am not proposing it. It would engines to make multiple 
get-bucket-location calls - one per table being looked up. 

--
aniket

Reply via email to