joyhaldar opened a new issue, #14422:
URL: https://github.com/apache/iceberg/issues/14422

   ### Query engine
   
   Spark locally and GCP Dataproc Serverless on the cloud.
   
   ### Question
   
   _**Description:**_
   I'm confused about the correct configuration properties for 
**BigQueryMetastoreCatalog** and believe there may be an inconsistency or 
documentation gap.
   
   _**Context:**_
   When using BigQueryMetastoreCatalog on Google Dataproc, the configuration 
from [Google's official 
documentation](https://docs.cloud.google.com/biglake/docs/configure-blms) works 
fine:
   
   ```
   spark.sql.catalog.my_catalog.gcp_project=PROJECT_ID
   spark.sql.catalog.my_catalog.gcp_location=LOCATION
   ```
   
   However, when I try to use the Apache Iceberg JAR directly (e.g., 
iceberg:1.10.0, iceberg-bigquery:1.10.0, ) with Spark running elsewhere 
(outside Dataproc), this configuration doesn't work.
   
   **_Investigation:_**
   Looking at the Iceberg source code 
(**[BigQueryMetastoreCatalog.java](https://github.com/apache/iceberg/blob/main/bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.java)**),
 the properties are defined as:
   ```
   public static final String PROJECT_ID = "gcp.bigquery.project-id";
   public static final String GCP_LOCATION = "gcp.bigquery.location";
   ```
   
   This suggests the configuration should be:
   ```
   spark.sql.catalog.my_catalog.gcp.bigquery.project-id=PROJECT_ID
   spark.sql.catalog.my_catalog.gcp.bigquery.location=LOCATION
   ```
   
   **_Questions:_**
   1. Why does the Google documentation approach (gcp_project, gcp_location) 
work on Dataproc but not with the standard Iceberg JAR?
   2. Is there a configuration translation or aliasing layer in the 
Dataproc-provided JAR 
(gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar) that's 
not present in the Maven Central release?
   3. What are the correct property names users should use when running Spark 
with Iceberg outside of Dataproc?
   4. Should the Iceberg codebase support both property name formats for 
compatibility, or should Google's documentation be updated?
   
   **_Expected Behavior:_**
   Documentation on which property names to use, and ideally support for both 
formats to avoid user confusion.
   
   **_Actual Behavior:_**
   Users following Google's documentation may face issues when using Iceberg 
JARs from Maven Central outside of Dataproc environments.
   
   
   Also, please correct me if I am wrong and way off.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to