[ 
https://issues.apache.org/jira/browse/SPARK-57518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090037#comment-18090037
 ] 

Anupam Yadav commented on SPARK-57518:
--------------------------------------

I would like to work on this. Plan: route the ThriftServer metadata operations 
(SparkGetCatalogsOperation, SparkGetSchemasOperation, SparkGetTablesOperation, 
SparkGetColumnsOperation) through CatalogManager so they honor DataSource V2 
catalogs and the default catalog, while preserving existing behavior on the 
default spark_catalog path (its v2 interface delegates to the V1 session 
catalog transparently). getCatalogs() will return the registered catalogs via 
CatalogManager.listCatalogs(); an unspecified catalog will resolve to the 
current catalog, consistent with Sparks existing resolution semantics (USE 
CATALOG / defaultCatalog). Will open a PR shortly.

> ThriftServer DatabaseMetaData getTables/getSchemas do not use DataSource V2 
> catalogs
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-57518
>                 URL: https://issues.apache.org/jira/browse/SPARK-57518
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.1.2
>            Reporter: Akihiro Okuno
>            Priority: Major
>
> Spark ThriftServer can execute SQL statements against DataSource V2 catalogs, 
> but its JDBC metadata operations still use the V1 SessionCatalog directly. 
> This causes inconsistent behavior for JDBC and BI clients.
> h2. Steps to reproduce
> 1. Start Spark ThriftServer with a DSv2 catalog configured, for example:
>  
> {code:java}
> spark.sql.catalog.mycat=<catalog implementation>
> spark.sql.defaultCatalog=mycat{code}
>  
> 2. Connect through JDBC/Beeline or a BI tool.
> 3. Run SQL statements such as:
>  
> {code:java}
> SELECT * FROM db.table;
> SELECT * FROM mycat.db.table;
> SHOW TABLES IN mycat.db;{code}
>  
> These statements can resolve DSv2 catalog objects.
> 4. Use JDBC DatabaseMetaData APIs through the same connection, for example:
>  
> {code:java}
> DatabaseMetaData.getSchemas(...)
> DatabaseMetaData.getTables(...)
> DatabaseMetaData.getColumns(...){code}
>  
> h2. Expected behavior:
> JDBC metadata operations should be consistent with SQL name resolution. If a 
> DSv2 catalog is configured and selected explicitly or through 
> spark.sql.defaultCatalog/current catalog, ThriftServer should discover 
> schemas, tables, and columns from that catalog.
> h2. Actual behavior:
> JDBC metadata operations only reflect objects visible through the V1 
> SessionCatalog. DSv2 catalog schemas/tables may be queryable via SQL but 
> missing from BI tool schema/table discovery.
> h2. Impact:
> Some BI and SQL GUI tools rely on JDBC DatabaseMetaData APIs to populate 
> schema and table trees. Users can query DSv2 tables manually, but tools 
> cannot discover them through ThriftServer metadata APIs.
> h2. Code pointers:
> SparkOperation currently exposes the V1 SessionCatalog:
> {code:java}
> org.apache.spark.sql.hive.thriftserver.SparkOperation
>   final protected def catalog: SessionCatalog = sessionState.catalog{code}
> Metadata operations then use that V1 catalog directly:
>  
> {code:java}
> org.apache.spark.sql.hive.thriftserver.SparkGetSchemasOperation
>   catalog.listDatabases(...)
> {code}
> {code:java}
> org.apache.spark.sql.hive.thriftserver.SparkGetTablesOperation
>   catalog.listDatabases(...)
>   catalog.listTables(...)
>   catalog.getTablesByName(...){code}
> h2. Possible approach:
> Make ThriftServer metadata operations DSv2-aware by using CatalogManager and 
> catalog APIs where appropriate:
>  * getCatalogs: use catalogManager.listCatalogs
>  * getSchemas: resolve the requested catalog and use SupportsNamespaces
>  * getTables: use TableCatalog.listTables
>  * getColumns: use TableCatalog.loadTable(...).schema
>  * preserve existing V1 SessionCatalog behavior for spark_catalog, temp 
> views, and global temp views



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to