On Wed, Sep 11, 2024 at 10:39:26AM UTC, Yufei Gu wrote: > > > > - the storage setup for S3 should have parameter for the bucket region > > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo) > > - if the parameter is not specified, then Polaris attempts to look up > > (get-bucket-location) the region. > > > I'd always prefer the second one. It provides more flexibility. For > example, it's likely that the regions are different for two namespaces > under the same catalog. Polaris may not support namespace-level or > table-level storage location well at this point, but it makes sense to do > so. This seems like a big topic. I will not expand here, we can discuss > separately.
Agreed. I am partial to that option as well. I just wanted to have an option for the user to specify it in the configuration. > > Plus, the assume-role configured for catalog should have permission to > infer the region. Agree. We should add this into Polaris docs; currently this is implicit and the docs don't explicitly talk about ensuring this permission has been given. I will raise a PR for that as well. -- aniket > > Yufei > > > On Wed, Sep 11, 2024 at 10:03 AM Yufei Gu <flyrain...@gmail.com> wrote: > > > Sorry, just realized that Edward has a new proposal to standardize > > credentials, https://github.com/apache/iceberg/pull/10722. In that case, > > we should add client.region as a field of AWS credential schema. I will > > comment on it in the PR. > > > > > Does the Polaris community intend to maintain a separate REST protocol > > and only sometimes upstream changes? > > > > I'm with JB for not diverging from the Iceberg REST spec. This is mainly > > for consistency, and to ensure interoperability across engines. > > > > Yufei > > > > > > On Wed, Sep 11, 2024 at 9:00 AM Jean-Baptiste Onofré <j...@nanthrax.net> > > wrote: > > > >> About your point Ryan, my "view" on that is that Polaris should follow > >> the REST spec, and the REST spec is at Iceberg. > >> > >> I don't think it would be a good idea to have a separate REST protocol > >> in Polaris, especially for engine interoperability about Iceberg > >> (engines should know only the Iceberg REST Spec/Client). > >> So, I'm more in favor of proposing directly REST spec changes at Iceberg. > >> > >> Thoughts ? > >> > >> Regards > >> JB > >> > >> On Wed, Sep 11, 2024 at 5:29 PM rdb...@gmail.com <rdb...@gmail.com> > >> wrote: > >> > > >> > Thanks for finding this, Aniket. It sounds like a good thing to fix in > >> the > >> > spec to me. > >> > > >> > This also brings up a question for the Polaris community. JB said he > >> would > >> > “draft a proposal to update the Iceberg REST Spec *as well*“. Does the > >> > Polaris community intend to maintain a separate REST protocol and only > >> > sometimes upstream changes? I think it would be good to have clear > >> guidance > >> > on this so that it is clear when an update should be taken upstream > >> first > >> > vs decided here and later standardized. > >> > > >> > Ryan > >> > > >> > On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré <j...@nanthrax.net> > >> > wrote: > >> > > >> > > Hi Aniket > >> > > > >> > > It looks good to me. I think AwsStorageConfigurationInfo is not > >> > > enough, we also need the client.region in the Iceberg REST Spec (for > >> > > consistency between engines). > >> > > > >> > > If there's no objection, I would draft a proposal to update the > >> > > Iceberg REST Spec as well. > >> > > As a workaround, we can store the client.region as property in the > >> > > Polaris entity (e.g. table). > >> > > > >> > > Thoughts ? > >> > > > >> > > Regards > >> > > JB > >> > > > >> > > On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni > >> > > <aniket.kulka...@dremio.com.invalid> wrote: > >> > > > > >> > > > For iceberg tables stored in AWS S3 buckets, knowing the region of > >> the > >> > > bucket is critical for engines using vended credentials (when > >> configured) > >> > > to access a table. > >> > > > > >> > > > E.g - the vended credentials for AWS look like this > >> > > > > >> > > > { "s3.access-key-id": "ASI....”, > >> > > > "s3.secret-access-key": "gbVT9PpFBY...”, > >> > > > "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”, > >> > > > "expiration-time": “1725572949000” } > >> > > > > >> > > > An engine consuming this, would need to either infer (s3api > >> > > get-bucket-location) the region or ask the end user to provide the > >> region > >> > > separately which misses the point of vended credentials. > >> > > > > >> > > > A engine engine cannot use get-bucket-location, because the > >> credential > >> > > generation explicitly allows only s3:GetObject, s3:GetObjectVersion, > >> > > s3:PutObject, s3:DeletObject, s3:ListBucket for the table location > >> prefix. > >> > > Refer - > >> > > > >> org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString > >> > > > > >> > > > I propose that > >> > > > > >> > > > - the storage setup for S3 should have parameter for the bucket > >> region > >> > > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo) > >> > > > - if the parameter is not specified, then Polaris attempts to look > >> up > >> > > (get-bucket-location) the region. > >> > > > - the information is returned in vended credentials (if enabled) as > >> > > "s3.region”:… > >> > > > > >> > > > Note - another option could be to allow ’s3:GetBucketLocation’ in > >> the > >> > > policyString when generating vended credentials’ iam role, but that > >> is sub > >> > > optimal and therefore I am not proposing it. It would engines to make > >> > > multiple get-bucket-location calls - one per table being looked up. > >> > > > > >> > > > -- > >> > > > aniket > >> > > > > >> > > > >> > >