Hi Yufei,

you are right, I saw client.region in the spec as well (I didn't
remember this part).

So I think we are good indeed.

Regards
JB

On Wed, Sep 11, 2024 at 5:48 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
> client.region has been as a part of comment of LoadTableResult,
> https://github.com/apache/iceberg/blob/5439cbdb278232779fdd9a392bbf57f007f9bda0/open-api/rest-catalog-open-api.yaml#L3127-L3127.
> Vended credentials will go to the LoadTableResult config. I think it's fine
> to keep it as is.
>
> Yufei
>
>
> On Wed, Sep 11, 2024 at 8:31 AM rdb...@gmail.com <rdb...@gmail.com> wrote:
>
> > Thanks for finding this, Aniket. It sounds like a good thing to fix in the
> > spec to me.
> >
> > This also brings up a question for the Polaris community. JB said he would
> > “draft a proposal to update the Iceberg REST Spec *as well*“. Does the
> > Polaris community intend to maintain a separate REST protocol and only
> > sometimes upstream changes? I think it would be good to have clear guidance
> > on this so that it is clear when an update should be taken upstream first
> > vs decided here and later standardized.
> >
> > Ryan
> >
> > On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >
> > > Hi Aniket
> > >
> > > It looks good to me. I think AwsStorageConfigurationInfo is not
> > > enough, we also need the client.region in the Iceberg REST Spec (for
> > > consistency between engines).
> > >
> > > If there's no objection, I would draft a proposal to update the
> > > Iceberg REST Spec as well.
> > > As a workaround, we can store the client.region as property in the
> > > Polaris entity (e.g. table).
> > >
> > > Thoughts ?
> > >
> > > Regards
> > > JB
> > >
> > > On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni
> > > <aniket.kulka...@dremio.com.invalid> wrote:
> > > >
> > > > For iceberg tables stored in AWS S3 buckets, knowing the region of the
> > > bucket is critical for engines using vended credentials (when configured)
> > > to access a table.
> > > >
> > > > E.g - the vended credentials for AWS look like this
> > > >
> > > > { "s3.access-key-id": "ASI....”,
> > > >   "s3.secret-access-key": "gbVT9PpFBY...”,
> > > >   "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”,
> > > >    "expiration-time": “1725572949000” }
> > > >
> > > > An engine consuming this, would need to either infer (s3api
> > > get-bucket-location) the region or ask the end user to provide the region
> > > separately which misses the point of vended credentials.
> > > >
> > > > A engine engine cannot use get-bucket-location, because the credential
> > > generation explicitly allows only s3:GetObject, s3:GetObjectVersion,
> > > s3:PutObject, s3:DeletObject, s3:ListBucket for the table location
> > prefix.
> > > Refer -
> > >
> > org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString
> > > >
> > > > I propose that
> > > >
> > > > - the storage setup for S3 should have parameter for the bucket region
> > > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
> > > > - if the parameter is not specified, then Polaris attempts to look up
> > > (get-bucket-location) the region.
> > > > - the information is returned in vended credentials (if enabled) as
> > > "s3.region”:…
> > > >
> > > > Note - another option could be to allow ’s3:GetBucketLocation’ in the
> > > policyString when generating vended credentials’ iam role, but that is
> > sub
> > > optimal and therefore I am not proposing it. It would engines to make
> > > multiple get-bucket-location calls - one per table being looked up.
> > > >
> > > > --
> > > > aniket
> > > >
> > >
> >

Reply via email to