Thanks for the approval on the PR, Rulin! But a quick question on protocol: do I make the changes gh-yzou is asking for then ask you to approve again?
(This is my first contribution to Polaris so I don't know what is expected of me.) Regards, Phillip On Thu, Mar 5, 2026 at 2:35 PM Phillip Henry <[email protected]> wrote: > Hi, Rulin. > > I've done as you've suggested with the exception of adding > GCP_CREDENTIALS_PATH_PROPERTY and GCP_SCOPES_PROPERTY to > GcpAuthenticationParametersDpo as they would not be used in this ticket and > I would recommend adding them in another ticket that uses them. > > Having said that, I don't feel strongly about this and am happy to add if > you really want them in this ticket. > > Otherwise, could you please take another look at the PR? > > Regards, > > Phillip > > > On Wed, Mar 4, 2026 at 6:58 PM Rulin Xing <[email protected]> wrote: > >> Hey Phillip, >> >> Personally, I’m leaning toward following the Iceberg catalog properties as >> they are today. >> >> In the Iceberg SDK, there isn't a first-class property for project id, >> it's >> currently passed through the `header.x-goog-user-project`, even biglake's >> public doc follows this pattern. >> >> I'd also prefer not to mix BigLake-specific properties with authentication >> parameters since authentication is more generic, e.g., even for AWS Glue, >> we don't introduce a special class for it but implement a generic >> SigV4AuthenticationParameters. As I mentioned earlier, the project id is a >> property required by the BigLake IRC, but it's not an authentication >> parameter. >> >> Given that, my proposal would be: >> >> - Add an additional headers field in IcebergRestConnectionConfigInfo. >> - I think it’s okay not to enforce checking the existence of this >> property when the target catalog is GCP. >> IcebergRestConnectionConfigInfo is >> designed to connect to any Iceberg REST–compliant catalog, so if >> a property >> is only required by a specific catalog implementation, it shouldn’t >> be >> validated generically at this layer. >> - Introduce GcpAuthenticationParametersDpo, but limit it to two fields >> for now that follows *GoogleAuthManager*: >> - GCP_CREDENTIALS_PATH_PROPERTY: if this property is not provided, >> it means that we will rely on gcp auth sdk to detect the gcp >> credentials >> from env >> - GCP_SCOPES_PROPERTY >> >> Later, we can extend this to support impersonation flow and subscope logic >> by adding the necessary service identity-related functionality. >> >> Regards, >> Rulin >> >> On Wed, Mar 4, 2026 at 3:43 AM Phillip Henry <[email protected]> >> wrote: >> >> > Thanks, Rulin. >> > >> > Regarding points 1-3: would a step in the right direction be to call the >> > DTO *BigLake*AuthenticationParametersDpo rather than *Gcp* >> > AuthenticationParametersDpo? I'm not trying to solve all the possible >> GCP >> > possibilities as I think that is beyond the scope of this ticket. >> Rather, I >> > just wanted to connect to my BigLake metastore - something the code >> > currently enables. >> > >> > If this were acceptable to you, I'd then argue that quotaProject is >> > analogous to, say, SigV4AuthenticationParameters.signingRegion insofar >> as >> > both basically define where the request is being sent (albeit one >> > physically and one logically). If people disagree, I'm happy to put >> > quotaProject elsewhere but I will need a steer on exactly where it >> should >> > go. >> > >> > What I'm trying to avoid, however, is the scope of this ticket >> ballooning. >> > >> > Please let me know your thoughts. >> > >> > Regards, >> > >> > Phillip >> > >> > >> > >> > On Tue, Mar 3, 2026 at 6:47 PM Rulin Xing <[email protected]> wrote: >> > >> > > Hi Phillip, >> > > >> > > Thanks for starting this thread! I want to clarify a few points >> regarding >> > > to my comment: >> > > >> > > 1. GoogleAuthManager.java#L76-L77 >> > > < >> > > >> > >> https://github.com/apache/iceberg/blob/1.10.x/gcp/src/main/java/org/apache/iceberg/gcp/auth/GoogleAuthManager.java#L76-L77 >> > > >: >> > > GoogleAuthManager only relies on these two properties for now: >> > > gcp.auth.credentials-path and gcp.auth.scopes. google project id is >> > not >> > > needed by GoogleAuthManager >> > > 2. We should clearly separate authentication parameters from other >> > > connection configurations. IcebergRestConnectionConfigInfo is >> designed >> > > to connect to any remote catalog that complies with the Iceberg >> REST >> > > spec, >> > > while GcpAuthenticationParametersDpo should include only Google >> > > authentication–related properties. It should be generic enough to >> work >> > > with >> > > any catalog that uses Google Auth. >> > > 3. Google Auth doesn't need the project id, it's GCP big lake that >> > > requires this property: >> > > https://docs.cloud.google.com/biglake/docs/blms-rest-catalog >> > > - If we look at Polaris GCP storage config, we also don’t need >> to >> > > provide a project ID, it only requires the gcs service account: >> > > (which is >> > > not a good design since this property should be provided by >> > > catalog service >> > > and shouldn't be provided by end users): >> > > >> > > >> > >> https://github.com/apache/polaris/blob/release/1.3.x/spec/polaris-management-service.yml#L1162-L1171 >> > > - If customers host their catalog server on GCP and use Google >> Auth >> > > for authentication, they do not need to specify a project id >> > > when creating >> > > the Polaris external catalog entity. >> > > - For example, if they use GCP Proxy Service (API Gateway) to >> > > expose REST APIs and use google auth, no project id is >> required. >> > > - GCP BigLake requires both header.x-goog-user-project and >> > warehouse >> > > (catalog name). These are used to identify the catalog on the >> > server >> > > side >> > > because BigLake supports multiplexing (it’s a multi-tenant >> > > catalog service >> > > where each tenant can have multiple catalogs). Neither of these >> > > properties >> > > is related to authentication. >> > > - From a design perspective, this is not ideal. For >> comparison, >> > in >> > > Glue, a single property is used to achieve the same purpose, >> > > e.g.,: >> > > - warehouse=<aws_account_id>:s3tables/<catalog_name> >> > > - Here, aws_account_id is equivalent to the Google project >> > ID, >> > > - and s3tables/<catalog_name> is equivalent to BigLake’s >> > > warehouse. >> > > 4. Another part that is not mentioned is the service identity >> > > part, we can discuss this later >> > > >> > > Thanks. >> > > Rulin >> > > >> > > On Mon, Mar 2, 2026 at 6:40 AM Phillip Henry <[email protected] >> > >> > > wrote: >> > > >> > > > Regarding the points Rulin Xing raises on ticket 3729 >> > > > <https://github.com/apache/polaris/pull/3729#discussion_r2866294655 >> >, >> > I >> > > > wanted to get some feedback from the community on the following. >> > > > >> > > > >> > > > 1. I'd argue that project id property should not live on >> > > > IcebergRestConnectionConfigInfo as it's GCP specific AFAIK >> > > > 2. If it were to live in IcebergRestConnectionConfigInfo as a >> map of >> > > > properties in (I think that's what you're suggesting RX - >> correct me >> > > if >> > > > I'm >> > > > wrong) then its presence could not be enforced for GCP calls. >> > > > 3. GcpAuthenticationParametersDpo "doesn't contain any >> > > > authentication-related info" but it does contain only what is >> > > necessary >> > > > to >> > > > trigger AuthManagers to use GCP and also to provide the means to >> do >> > > that >> > > > via its asIcebergCatalogProperties and >> > asAuthenticationParametersModel >> > > > methods. I'd argue that this way of doing it leverages the >> machinery >> > > > that's >> > > > already in place and minimally impacts the rest of the codebase. >> > > > >> > > > >> > > > So, I'd propose removing the warehouse field as RX has helpfully >> > pointed >> > > > out but I think GcpAuthenticationParametersDpo should stay even if >> its >> > > DTO >> > > > counterpart is revised. >> > > > >> > > > Thoughts? >> > > > >> > > >> > >> >
