What would be the difference of the region specified in/for the credentials and the region that's configured for the S3FileIO (via LoadTableResult)?

On 12.09.24 06:05, Aniket Kulkarni wrote:
On Wed, Sep 11, 2024 at 10:39:26AM UTC, Yufei Gu wrote:
- the storage setup for S3 should have parameter for the bucket region
(org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
- if the parameter is not specified, then Polaris attempts to look up
(get-bucket-location) the region.

I'd always prefer the second one. It provides more flexibility. For
example, it's likely that the regions are different for two namespaces
under the same catalog. Polaris may not support namespace-level or
table-level storage location well at this point, but it makes sense to do
so. This seems like a big topic. I will not expand here, we can discuss
separately.
Agreed. I am partial to that option as well. I just wanted to have an option
for the user to specify it in the configuration.

Plus, the assume-role configured for catalog should have permission to
infer the region.
Agree. We should add this into Polaris docs; currently this is implicit
and the docs don't explicitly talk about ensuring this permission has been
given. I will raise a PR for that as well.

--
aniket

Yufei


On Wed, Sep 11, 2024 at 10:03 AM Yufei Gu <flyrain...@gmail.com> wrote:

Sorry, just realized that Edward has a new proposal to standardize
credentials, https://github.com/apache/iceberg/pull/10722. In that case,
we should add client.region as a field of AWS credential schema. I will
comment on it in the PR.

Does the Polaris community intend to maintain a separate REST protocol
and only sometimes upstream changes?

I'm with JB for not diverging from the Iceberg REST spec. This is mainly
for consistency, and to ensure interoperability across engines.

Yufei


On Wed, Sep 11, 2024 at 9:00 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

About your point Ryan, my "view" on that is that Polaris should follow
the REST spec, and the REST spec is at Iceberg.

I don't think it would be a good idea to have a separate REST protocol
in Polaris, especially for engine interoperability about Iceberg
(engines should know only the Iceberg REST Spec/Client).
So, I'm more in favor of proposing directly REST spec changes at Iceberg.

Thoughts ?

Regards
JB

On Wed, Sep 11, 2024 at 5:29 PM rdb...@gmail.com <rdb...@gmail.com>
wrote:
Thanks for finding this, Aniket. It sounds like a good thing to fix in
the
spec to me.

This also brings up a question for the Polaris community. JB said he
would
“draft a proposal to update the Iceberg REST Spec *as well*“. Does the
Polaris community intend to maintain a separate REST protocol and only
sometimes upstream changes? I think it would be good to have clear
guidance
on this so that it is clear when an update should be taken upstream
first
vs decided here and later standardized.

Ryan

On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

Hi Aniket

It looks good to me. I think AwsStorageConfigurationInfo is not
enough, we also need the client.region in the Iceberg REST Spec (for
consistency between engines).

If there's no objection, I would draft a proposal to update the
Iceberg REST Spec as well.
As a workaround, we can store the client.region as property in the
Polaris entity (e.g. table).

Thoughts ?

Regards
JB

On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni
<aniket.kulka...@dremio.com.invalid> wrote:
For iceberg tables stored in AWS S3 buckets, knowing the region of
the
bucket is critical for engines using vended credentials (when
configured)
to access a table.
E.g - the vended credentials for AWS look like this

{ "s3.access-key-id": "ASI....”,
   "s3.secret-access-key": "gbVT9PpFBY...”,
   "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”,
    "expiration-time": “1725572949000” }

An engine consuming this, would need to either infer (s3api
get-bucket-location) the region or ask the end user to provide the
region
separately which misses the point of vended credentials.
A engine engine cannot use get-bucket-location, because the
credential
generation explicitly allows only s3:GetObject, s3:GetObjectVersion,
s3:PutObject, s3:DeletObject, s3:ListBucket for the table location
prefix.
Refer -

org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString
I propose that

- the storage setup for S3 should have parameter for the bucket
region
(org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
- if the parameter is not specified, then Polaris attempts to look
up
(get-bucket-location) the region.
- the information is returned in vended credentials (if enabled) as
"s3.region”:…
Note - another option could be to allow ’s3:GetBucketLocation’ in
the
policyString when generating vended credentials’ iam role, but that
is sub
optimal and therefore I am not proposing it. It would engines to make
multiple get-bucket-location calls - one per table being looked up.
--
aniket

--
Robert Stupp
@snazy

Reply via email to