Thanks for finding this, Aniket. It sounds like a good thing to fix in the spec to me.
This also brings up a question for the Polaris community. JB said he would “draft a proposal to update the Iceberg REST Spec *as well*“. Does the Polaris community intend to maintain a separate REST protocol and only sometimes upstream changes? I think it would be good to have clear guidance on this so that it is clear when an update should be taken upstream first vs decided here and later standardized. Ryan On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Aniket > > It looks good to me. I think AwsStorageConfigurationInfo is not > enough, we also need the client.region in the Iceberg REST Spec (for > consistency between engines). > > If there's no objection, I would draft a proposal to update the > Iceberg REST Spec as well. > As a workaround, we can store the client.region as property in the > Polaris entity (e.g. table). > > Thoughts ? > > Regards > JB > > On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni > <aniket.kulka...@dremio.com.invalid> wrote: > > > > For iceberg tables stored in AWS S3 buckets, knowing the region of the > bucket is critical for engines using vended credentials (when configured) > to access a table. > > > > E.g - the vended credentials for AWS look like this > > > > { "s3.access-key-id": "ASI....”, > > "s3.secret-access-key": "gbVT9PpFBY...”, > > "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”, > > "expiration-time": “1725572949000” } > > > > An engine consuming this, would need to either infer (s3api > get-bucket-location) the region or ask the end user to provide the region > separately which misses the point of vended credentials. > > > > A engine engine cannot use get-bucket-location, because the credential > generation explicitly allows only s3:GetObject, s3:GetObjectVersion, > s3:PutObject, s3:DeletObject, s3:ListBucket for the table location prefix. > Refer - > org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString > > > > I propose that > > > > - the storage setup for S3 should have parameter for the bucket region > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo) > > - if the parameter is not specified, then Polaris attempts to look up > (get-bucket-location) the region. > > - the information is returned in vended credentials (if enabled) as > "s3.region”:… > > > > Note - another option could be to allow ’s3:GetBucketLocation’ in the > policyString when generating vended credentials’ iam role, but that is sub > optimal and therefore I am not proposing it. It would engines to make > multiple get-bucket-location calls - one per table being looked up. > > > > -- > > aniket > > >