On Thu, Sep 12, 2024 at 02:38:30PM UTC, Robert Stupp wrote:
> What would be the difference of the region specified in/for the credentials
> and the region that's configured for the S3FileIO (via LoadTableResult)?

My point is that _region_ configured for the catalog via storageConfigInfo in
createCatalog [1] is the one that should be returned in LoadTableResult
(client.region). And if the user doesn't specify a region in storageConfigInfo
then it is infered and then returned in LoadTableResult

--
aniket

[1] 
https://polaris.io/#tag/polaris-management-service_other/operation/createCatalog

> On 12.09.24 06:05, Aniket Kulkarni wrote:
> > On Wed, Sep 11, 2024 at 10:39:26AM UTC, Yufei Gu wrote:
> > > > - the storage setup for S3 should have parameter for the bucket region
> > > > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
> > > > - if the parameter is not specified, then Polaris attempts to look up
> > > > (get-bucket-location) the region.
> > > 
> > > I'd always prefer the second one. It provides more flexibility. For
> > > example, it's likely that the regions are different for two namespaces
> > > under the same catalog. Polaris may not support namespace-level or
> > > table-level storage location well at this point, but it makes sense to do
> > > so. This seems like a big topic. I will not expand here, we can discuss
> > > separately.
> > Agreed. I am partial to that option as well. I just wanted to have an option
> > for the user to specify it in the configuration.
> > 
> > > Plus, the assume-role configured for catalog should have permission to
> > > infer the region.
> > Agree. We should add this into Polaris docs; currently this is implicit
> > and the docs don't explicitly talk about ensuring this permission has been
> > given. I will raise a PR for that as well.
> > 
> > --
> > aniket
> > 
> > > Yufei
> > > 
> > > 
> > > On Wed, Sep 11, 2024 at 10:03 AM Yufei Gu <flyrain...@gmail.com> wrote:
> > > 
> > > > Sorry, just realized that Edward has a new proposal to standardize
> > > > credentials, https://github.com/apache/iceberg/pull/10722. In that case,
> > > > we should add client.region as a field of AWS credential schema. I will
> > > > comment on it in the PR.
> > > > 
> > > > > Does the Polaris community intend to maintain a separate REST protocol
> > > > and only sometimes upstream changes?
> > > > 
> > > > I'm with JB for not diverging from the Iceberg REST spec. This is mainly
> > > > for consistency, and to ensure interoperability across engines.
> > > > 
> > > > Yufei
> > > > 
> > > > 
> > > > On Wed, Sep 11, 2024 at 9:00 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> > > > wrote:
> > > > 
> > > > > About your point Ryan, my "view" on that is that Polaris should follow
> > > > > the REST spec, and the REST spec is at Iceberg.
> > > > > 
> > > > > I don't think it would be a good idea to have a separate REST protocol
> > > > > in Polaris, especially for engine interoperability about Iceberg
> > > > > (engines should know only the Iceberg REST Spec/Client).
> > > > > So, I'm more in favor of proposing directly REST spec changes at 
> > > > > Iceberg.
> > > > > 
> > > > > Thoughts ?
> > > > > 
> > > > > Regards
> > > > > JB
> > > > > 
> > > > > On Wed, Sep 11, 2024 at 5:29 PM rdb...@gmail.com <rdb...@gmail.com>
> > > > > wrote:
> > > > > > Thanks for finding this, Aniket. It sounds like a good thing to fix 
> > > > > > in
> > > > > the
> > > > > > spec to me.
> > > > > > 
> > > > > > This also brings up a question for the Polaris community. JB said he
> > > > > would
> > > > > > “draft a proposal to update the Iceberg REST Spec *as well*“. Does 
> > > > > > the
> > > > > > Polaris community intend to maintain a separate REST protocol and 
> > > > > > only
> > > > > > sometimes upstream changes? I think it would be good to have clear
> > > > > guidance
> > > > > > on this so that it is clear when an update should be taken upstream
> > > > > first
> > > > > > vs decided here and later standardized.
> > > > > > 
> > > > > > Ryan
> > > > > > 
> > > > > > On Wed, Sep 11, 2024 at 7:34 AM Jean-Baptiste Onofré 
> > > > > > <j...@nanthrax.net>
> > > > > > wrote:
> > > > > > 
> > > > > > > Hi Aniket
> > > > > > > 
> > > > > > > It looks good to me. I think AwsStorageConfigurationInfo is not
> > > > > > > enough, we also need the client.region in the Iceberg REST Spec 
> > > > > > > (for
> > > > > > > consistency between engines).
> > > > > > > 
> > > > > > > If there's no objection, I would draft a proposal to update the
> > > > > > > Iceberg REST Spec as well.
> > > > > > > As a workaround, we can store the client.region as property in the
> > > > > > > Polaris entity (e.g. table).
> > > > > > > 
> > > > > > > Thoughts ?
> > > > > > > 
> > > > > > > Regards
> > > > > > > JB
> > > > > > > 
> > > > > > > On Wed, Sep 11, 2024 at 4:31 AM Aniket Kulkarni
> > > > > > > <aniket.kulka...@dremio.com.invalid> wrote:
> > > > > > > > For iceberg tables stored in AWS S3 buckets, knowing the region 
> > > > > > > > of
> > > > > the
> > > > > > > bucket is critical for engines using vended credentials (when
> > > > > configured)
> > > > > > > to access a table.
> > > > > > > > E.g - the vended credentials for AWS look like this
> > > > > > > > 
> > > > > > > > { "s3.access-key-id": "ASI....”,
> > > > > > > >    "s3.secret-access-key": "gbVT9PpFBY...”,
> > > > > > > >    "s3.session-token": "IQoJb3JpZ2luX2VjEN3//////////...”,
> > > > > > > >     "expiration-time": “1725572949000” }
> > > > > > > > 
> > > > > > > > An engine consuming this, would need to either infer (s3api
> > > > > > > get-bucket-location) the region or ask the end user to provide the
> > > > > region
> > > > > > > separately which misses the point of vended credentials.
> > > > > > > > A engine engine cannot use get-bucket-location, because the
> > > > > credential
> > > > > > > generation explicitly allows only s3:GetObject, 
> > > > > > > s3:GetObjectVersion,
> > > > > > > s3:PutObject, s3:DeletObject, s3:ListBucket for the table location
> > > > > prefix.
> > > > > > > Refer -
> > > > > > > 
> > > > > org.apache.polaris.core.storage.aws.AwsCredentialsStorageIntegration#policyString
> > > > > > > > I propose that
> > > > > > > > 
> > > > > > > > - the storage setup for S3 should have parameter for the bucket
> > > > > region
> > > > > > > (org.apache.polaris.core.storage.aws.AwsStorageConfigurationInfo)
> > > > > > > > - if the parameter is not specified, then Polaris attempts to 
> > > > > > > > look
> > > > > up
> > > > > > > (get-bucket-location) the region.
> > > > > > > > - the information is returned in vended credentials (if 
> > > > > > > > enabled) as
> > > > > > > "s3.region”:…
> > > > > > > > Note - another option could be to allow ’s3:GetBucketLocation’ 
> > > > > > > > in
> > > > > the
> > > > > > > policyString when generating vended credentials’ iam role, but 
> > > > > > > that
> > > > > is sub
> > > > > > > optimal and therefore I am not proposing it. It would engines to 
> > > > > > > make
> > > > > > > multiple get-bucket-location calls - one per table being looked 
> > > > > > > up.
> > > > > > > > --
> > > > > > > > aniket
> > > > > > > > 
> -- 
> Robert Stupp
> @snazy
> 

Reply via email to