> > - the storage setup for S3 should have parameter for the bucket region > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo) > - if the parameter is not specified, then Polaris attempts to look up > (get-bucket-location) the region.
I'd always prefer the second one. It provides more flexibility. For example, it's likely that the regions are different for two namespaces under the same catalog. Polaris may not support namespace-level or table-level storage location well at this point, but it makes sense to do so. This seems like a big topic. I will not expand here, we can discuss separately. Plus, the assume-role configured for catalog should have permission to infer the region. Yufei On Wed, Sep 11, 2024 at 10:03 AM Yufei Gu <flyrain...@gmail.com> wrote: > Sorry, just realized that Edward has a new proposal to standardize > credentials, https://github.com/apache/iceberg/pull/10722. In that case, > we should add client.region as a field of AWS credential schema. I will > comment on it in the PR. > > > Does the Polaris community intend to maintain a separate REST protocol > and only sometimes upstream changes? > > I'm with JB for not diverging from the Iceberg REST spec. This is mainly > for consistency, and to ensure interoperability across engines. > > Yufei > > > On Wed, Sep 11, 2024 at 9:00 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> About your point Ryan, my "view" on that is that Polaris should follow >> the REST spec, and the REST spec is at Iceberg. >> >> I don't think it would be a good idea to have a separate REST protocol >> in Polaris, especially for engine interoperability about Iceberg >> (engines should know only the Iceberg REST Spec/Client). >> So, I'm more in favor of proposing directly REST spec changes at Iceberg. >> >> Thoughts ? >> >> Regards >> JB >> >> On Wed, Sep 11, 2024 at 5:29 PM rdb...@gmail.com <rdb...@gmail.com> >> wrote: >> > >> > Thanks for finding this, Aniket. It sounds like a good thing to fix in >> the >> > spec to me. >> > >> > This also brings up a question for the Polaris community. JB said he >> would >> > “draft a proposal to update the Iceberg REST Spec *as well*“. Does the >> > Polaris community intend to maintain a separate REST protocol and only >> > sometimes upstream changes? I think it would be good to have clear >> guidance >> > on this so that it is clear when an update should be taken upstream >> first >> > vs decided here and later standardized. >> > >> > Ryan >> > >> > On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> > wrote: >> > >> > > Hi Aniket >> > > >> > > It looks good to me. I think AwsStorageConfigurationInfo is not >> > > enough, we also need the client.region in the Iceberg REST Spec (for >> > > consistency between engines). >> > > >> > > If there's no objection, I would draft a proposal to update the >> > > Iceberg REST Spec as well. >> > > As a workaround, we can store the client.region as property in the >> > > Polaris entity (e.g. table). >> > > >> > > Thoughts ? >> > > >> > > Regards >> > > JB >> > > >> > > On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni >> > > <aniket.kulka...@dremio.com.invalid> wrote: >> > > > >> > > > For iceberg tables stored in AWS S3 buckets, knowing the region of >> the >> > > bucket is critical for engines using vended credentials (when >> configured) >> > > to access a table. >> > > > >> > > > E.g - the vended credentials for AWS look like this >> > > > >> > > > { "s3.access-key-id": "ASI....”, >> > > > "s3.secret-access-key": "gbVT9PpFBY...”, >> > > > "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”, >> > > > "expiration-time": “1725572949000” } >> > > > >> > > > An engine consuming this, would need to either infer (s3api >> > > get-bucket-location) the region or ask the end user to provide the >> region >> > > separately which misses the point of vended credentials. >> > > > >> > > > A engine engine cannot use get-bucket-location, because the >> credential >> > > generation explicitly allows only s3:GetObject, s3:GetObjectVersion, >> > > s3:PutObject, s3:DeletObject, s3:ListBucket for the table location >> prefix. >> > > Refer - >> > > >> org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString >> > > > >> > > > I propose that >> > > > >> > > > - the storage setup for S3 should have parameter for the bucket >> region >> > > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo) >> > > > - if the parameter is not specified, then Polaris attempts to look >> up >> > > (get-bucket-location) the region. >> > > > - the information is returned in vended credentials (if enabled) as >> > > "s3.region”:… >> > > > >> > > > Note - another option could be to allow ’s3:GetBucketLocation’ in >> the >> > > policyString when generating vended credentials’ iam role, but that >> is sub >> > > optimal and therefore I am not proposing it. It would engines to make >> > > multiple get-bucket-location calls - one per table being looked up. >> > > > >> > > > -- >> > > > aniket >> > > > >> > > >> >