On Wed, Sep 11, 2024 at 10:39:26AM UTC, Yufei Gu wrote:
> >
> > - the storage setup for S3 should have parameter for the bucket region
> > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
> > - if the parameter is not specified, then Polaris attempts to look up
> > (get-bucket-location) the region.
> 
> 
> I'd always prefer the second one. It provides more flexibility. For
> example, it's likely that the regions are different for two namespaces
> under the same catalog. Polaris may not support namespace-level or
> table-level storage location well at this point, but it makes sense to do
> so. This seems like a big topic. I will not expand here, we can discuss
> separately.

Agreed. I am partial to that option as well. I just wanted to have an option
for the user to specify it in the configuration. 

> 
> Plus, the assume-role configured for catalog should have permission to
> infer the region.

Agree. We should add this into Polaris docs; currently this is implicit
and the docs don't explicitly talk about ensuring this permission has been
given. I will raise a PR for that as well. 

--
aniket

> 
> Yufei
> 
> 
> On Wed, Sep 11, 2024 at 10:03 AM Yufei Gu <flyrain...@gmail.com> wrote:
> 
> > Sorry, just realized that Edward has a new proposal to standardize
> > credentials, https://github.com/apache/iceberg/pull/10722. In that case,
> > we should add client.region as a field of AWS credential schema. I will
> > comment on it in the PR.
> >
> > > Does the Polaris community intend to maintain a separate REST protocol
> > and only sometimes upstream changes?
> >
> > I'm with JB for not diverging from the Iceberg REST spec. This is mainly
> > for consistency, and to ensure interoperability across engines.
> >
> > Yufei
> >
> >
> > On Wed, Sep 11, 2024 at 9:00 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >
> >> About your point Ryan, my "view" on that is that Polaris should follow
> >> the REST spec, and the REST spec is at Iceberg.
> >>
> >> I don't think it would be a good idea to have a separate REST protocol
> >> in Polaris, especially for engine interoperability about Iceberg
> >> (engines should know only the Iceberg REST Spec/Client).
> >> So, I'm more in favor of proposing directly REST spec changes at Iceberg.
> >>
> >> Thoughts ?
> >>
> >> Regards
> >> JB
> >>
> >> On Wed, Sep 11, 2024 at 5:29 PM rdb...@gmail.com <rdb...@gmail.com>
> >> wrote:
> >> >
> >> > Thanks for finding this, Aniket. It sounds like a good thing to fix in
> >> the
> >> > spec to me.
> >> >
> >> > This also brings up a question for the Polaris community. JB said he
> >> would
> >> > “draft a proposal to update the Iceberg REST Spec *as well*“. Does the
> >> > Polaris community intend to maintain a separate REST protocol and only
> >> > sometimes upstream changes? I think it would be good to have clear
> >> guidance
> >> > on this so that it is clear when an update should be taken upstream
> >> first
> >> > vs decided here and later standardized.
> >> >
> >> > Ryan
> >> >
> >> > On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> >> > wrote:
> >> >
> >> > > Hi Aniket
> >> > >
> >> > > It looks good to me. I think AwsStorageConfigurationInfo is not
> >> > > enough, we also need the client.region in the Iceberg REST Spec (for
> >> > > consistency between engines).
> >> > >
> >> > > If there's no objection, I would draft a proposal to update the
> >> > > Iceberg REST Spec as well.
> >> > > As a workaround, we can store the client.region as property in the
> >> > > Polaris entity (e.g. table).
> >> > >
> >> > > Thoughts ?
> >> > >
> >> > > Regards
> >> > > JB
> >> > >
> >> > > On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni
> >> > > <aniket.kulka...@dremio.com.invalid> wrote:
> >> > > >
> >> > > > For iceberg tables stored in AWS S3 buckets, knowing the region of
> >> the
> >> > > bucket is critical for engines using vended credentials (when
> >> configured)
> >> > > to access a table.
> >> > > >
> >> > > > E.g - the vended credentials for AWS look like this
> >> > > >
> >> > > > { "s3.access-key-id": "ASI....”,
> >> > > >   "s3.secret-access-key": "gbVT9PpFBY...”,
> >> > > >   "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”,
> >> > > >    "expiration-time": “1725572949000” }
> >> > > >
> >> > > > An engine consuming this, would need to either infer (s3api
> >> > > get-bucket-location) the region or ask the end user to provide the
> >> region
> >> > > separately which misses the point of vended credentials.
> >> > > >
> >> > > > A engine engine cannot use get-bucket-location, because the
> >> credential
> >> > > generation explicitly allows only s3:GetObject, s3:GetObjectVersion,
> >> > > s3:PutObject, s3:DeletObject, s3:ListBucket for the table location
> >> prefix.
> >> > > Refer -
> >> > >
> >> org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString
> >> > > >
> >> > > > I propose that
> >> > > >
> >> > > > - the storage setup for S3 should have parameter for the bucket
> >> region
> >> > > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
> >> > > > - if the parameter is not specified, then Polaris attempts to look
> >> up
> >> > > (get-bucket-location) the region.
> >> > > > - the information is returned in vended credentials (if enabled) as
> >> > > "s3.region”:…
> >> > > >
> >> > > > Note - another option could be to allow ’s3:GetBucketLocation’ in
> >> the
> >> > > policyString when generating vended credentials’ iam role, but that
> >> is sub
> >> > > optimal and therefore I am not proposing it. It would engines to make
> >> > > multiple get-bucket-location calls - one per table being looked up.
> >> > > >
> >> > > > --
> >> > > > aniket
> >> > > >
> >> > >
> >>
> >

Reply via email to