About your point Ryan, my "view" on that is that Polaris should follow the REST spec, and the REST spec is at Iceberg.
I don't think it would be a good idea to have a separate REST protocol in Polaris, especially for engine interoperability about Iceberg (engines should know only the Iceberg REST Spec/Client). So, I'm more in favor of proposing directly REST spec changes at Iceberg. Thoughts ? Regards JB On Wed, Sep 11, 2024 at 5:29 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > > Thanks for finding this, Aniket. It sounds like a good thing to fix in the > spec to me. > > This also brings up a question for the Polaris community. JB said he would > “draft a proposal to update the Iceberg REST Spec *as well*“. Does the > Polaris community intend to maintain a separate REST protocol and only > sometimes upstream changes? I think it would be good to have clear guidance > on this so that it is clear when an update should be taken upstream first > vs decided here and later standardized. > > Ryan > > On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi Aniket > > > > It looks good to me. I think AwsStorageConfigurationInfo is not > > enough, we also need the client.region in the Iceberg REST Spec (for > > consistency between engines). > > > > If there's no objection, I would draft a proposal to update the > > Iceberg REST Spec as well. > > As a workaround, we can store the client.region as property in the > > Polaris entity (e.g. table). > > > > Thoughts ? > > > > Regards > > JB > > > > On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni > > <aniket.kulka...@dremio.com.invalid> wrote: > > > > > > For iceberg tables stored in AWS S3 buckets, knowing the region of the > > bucket is critical for engines using vended credentials (when configured) > > to access a table. > > > > > > E.g - the vended credentials for AWS look like this > > > > > > { "s3.access-key-id": "ASI....”, > > > "s3.secret-access-key": "gbVT9PpFBY...”, > > > "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”, > > > "expiration-time": “1725572949000” } > > > > > > An engine consuming this, would need to either infer (s3api > > get-bucket-location) the region or ask the end user to provide the region > > separately which misses the point of vended credentials. > > > > > > A engine engine cannot use get-bucket-location, because the credential > > generation explicitly allows only s3:GetObject, s3:GetObjectVersion, > > s3:PutObject, s3:DeletObject, s3:ListBucket for the table location prefix. > > Refer - > > org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString > > > > > > I propose that > > > > > > - the storage setup for S3 should have parameter for the bucket region > > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo) > > > - if the parameter is not specified, then Polaris attempts to look up > > (get-bucket-location) the region. > > > - the information is returned in vended credentials (if enabled) as > > "s3.region”:… > > > > > > Note - another option could be to allow ’s3:GetBucketLocation’ in the > > policyString when generating vended credentials’ iam role, but that is sub > > optimal and therefore I am not proposing it. It would engines to make > > multiple get-bucket-location calls - one per table being looked up. > > > > > > -- > > > aniket > > > > >