[ 
https://issues.apache.org/jira/browse/SPARK-50759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-50759.
---------------------------------
    Fix Version/s: 4.1.0
       Resolution: Fixed

Issue resolved by pull request 50085
[https://github.com/apache/spark/pull/50085]

> Spark catalog api bug when working with non-hms based catalog
> -------------------------------------------------------------
>
>                 Key: SPARK-50759
>                 URL: https://issues.apache.org/jira/browse/SPARK-50759
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.4.0, 4.0.0, 3.5.4
>            Reporter: Sunny malik
>            Assignee: Sunny malik
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> Hi
> I am encountering issues while working with a REST-based catalog. My Spark 
> session is configured with a default catalog that uses the REST-based 
> implementation.
> The {{SparkSession.catalog}} API does not function correctly with the 
> REST-based catalog. This issue has been tested and observed in Spark 3.4.
> ----------------------------------------------------------------------------------
> ${SPARK_HOME}/bin/spark-shell --master local[*]
> --driver-memory 2g
> --conf 
> spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
> --conf 
> spark.sql.catalog.iceberg.uri=[https://xx.xxx.xxxx.domain.com|https://xx.xxx.xxxx.domain.com/]
> --conf spark.sql.warehouse.dir=$SQL_WAREHOUSE_DIR
> --conf spark.sql.defaultCatalog=iceberg
> --conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
> --conf 
> spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog \
> scala> spark.catalog.currentCatalog
> res1: String = iceberg
> scala> spark.sql("select * from restDb.restTable").show
> +---+----------+
> | id| data|
> +---+----------+
> | 1|some_value|
> +---+----------+
> scala> spark.catalog.tableExists("restDb.restTable")
> *res3: Boolean = true*
> scala> spark.catalog.tableExists("restDb", "restTable")
> *res4: Boolean = false*
> ----------------------------------------------------------------------------------
>  
> API spark.catalog.tableExists(String databaseName, String tableName) 
>  is only meant to work with HMS based catalog 
> ([https://github.com/apache/spark/blob/5a91172c019c119e686f8221bbdb31f59d3d7776/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L224])
>  
> spark.catalog.tableExists(String databaseName, String tableName) 
>   is meant to work with hms and non-hms based catalogs 
>  
>  
> Suggested resolutions
> 1. API spark.catalog.tableExists(String databaseName, String tableName) to 
> throw runtime exception if session catalog is non-hms based catalog
> 2. Deprecrate HMS specific API in newer Spark release as Spark already have 
> API that can work with hms and non-hms based catalogs.
>  
> Thanks
> Sunny



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to