About your point Ryan, my "view" on that is that Polaris should follow
the REST spec, and the REST spec is at Iceberg.

I don't think it would be a good idea to have a separate REST protocol
in Polaris, especially for engine interoperability about Iceberg
(engines should know only the Iceberg REST Spec/Client).
So, I'm more in favor of proposing directly REST spec changes at Iceberg.

Thoughts ?

Regards
JB

On Wed, Sep 11, 2024 at 5:29 PM rdb...@gmail.com <rdb...@gmail.com> wrote:
>
> Thanks for finding this, Aniket. It sounds like a good thing to fix in the
> spec to me.
>
> This also brings up a question for the Polaris community. JB said he would
> “draft a proposal to update the Iceberg REST Spec *as well*“. Does the
> Polaris community intend to maintain a separate REST protocol and only
> sometimes upstream changes? I think it would be good to have clear guidance
> on this so that it is clear when an update should be taken upstream first
> vs decided here and later standardized.
>
> Ryan
>
> On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Hi Aniket
> >
> > It looks good to me. I think AwsStorageConfigurationInfo is not
> > enough, we also need the client.region in the Iceberg REST Spec (for
> > consistency between engines).
> >
> > If there's no objection, I would draft a proposal to update the
> > Iceberg REST Spec as well.
> > As a workaround, we can store the client.region as property in the
> > Polaris entity (e.g. table).
> >
> > Thoughts ?
> >
> > Regards
> > JB
> >
> > On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni
> > <aniket.kulka...@dremio.com.invalid> wrote:
> > >
> > > For iceberg tables stored in AWS S3 buckets, knowing the region of the
> > bucket is critical for engines using vended credentials (when configured)
> > to access a table.
> > >
> > > E.g - the vended credentials for AWS look like this
> > >
> > > { "s3.access-key-id": "ASI....”,
> > >   "s3.secret-access-key": "gbVT9PpFBY...”,
> > >   "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”,
> > >    "expiration-time": “1725572949000” }
> > >
> > > An engine consuming this, would need to either infer (s3api
> > get-bucket-location) the region or ask the end user to provide the region
> > separately which misses the point of vended credentials.
> > >
> > > A engine engine cannot use get-bucket-location, because the credential
> > generation explicitly allows only s3:GetObject, s3:GetObjectVersion,
> > s3:PutObject, s3:DeletObject, s3:ListBucket for the table location prefix.
> > Refer -
> > org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString
> > >
> > > I propose that
> > >
> > > - the storage setup for S3 should have parameter for the bucket region
> > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
> > > - if the parameter is not specified, then Polaris attempts to look up
> > (get-bucket-location) the region.
> > > - the information is returned in vended credentials (if enabled) as
> > "s3.region”:…
> > >
> > > Note - another option could be to allow ’s3:GetBucketLocation’ in the
> > policyString when generating vended credentials’ iam role, but that is sub
> > optimal and therefore I am not proposing it. It would engines to make
> > multiple get-bucket-location calls - one per table being looked up.
> > >
> > > --
> > > aniket
> > >
> >

Reply via email to