[
https://issues.apache.org/jira/browse/SPARK-57518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090037#comment-18090037
]
Anupam Yadav commented on SPARK-57518:
--------------------------------------
I would like to work on this. Plan: route the ThriftServer metadata operations
(SparkGetCatalogsOperation, SparkGetSchemasOperation, SparkGetTablesOperation,
SparkGetColumnsOperation) through CatalogManager so they honor DataSource V2
catalogs and the default catalog, while preserving existing behavior on the
default spark_catalog path (its v2 interface delegates to the V1 session
catalog transparently). getCatalogs() will return the registered catalogs via
CatalogManager.listCatalogs(); an unspecified catalog will resolve to the
current catalog, consistent with Sparks existing resolution semantics (USE
CATALOG / defaultCatalog). Will open a PR shortly.
> ThriftServer DatabaseMetaData getTables/getSchemas do not use DataSource V2
> catalogs
> ------------------------------------------------------------------------------------
>
> Key: SPARK-57518
> URL: https://issues.apache.org/jira/browse/SPARK-57518
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.1.2
> Reporter: Akihiro Okuno
> Priority: Major
>
> Spark ThriftServer can execute SQL statements against DataSource V2 catalogs,
> but its JDBC metadata operations still use the V1 SessionCatalog directly.
> This causes inconsistent behavior for JDBC and BI clients.
> h2. Steps to reproduce
> 1. Start Spark ThriftServer with a DSv2 catalog configured, for example:
>
> {code:java}
> spark.sql.catalog.mycat=<catalog implementation>
> spark.sql.defaultCatalog=mycat{code}
>
> 2. Connect through JDBC/Beeline or a BI tool.
> 3. Run SQL statements such as:
>
> {code:java}
> SELECT * FROM db.table;
> SELECT * FROM mycat.db.table;
> SHOW TABLES IN mycat.db;{code}
>
> These statements can resolve DSv2 catalog objects.
> 4. Use JDBC DatabaseMetaData APIs through the same connection, for example:
>
> {code:java}
> DatabaseMetaData.getSchemas(...)
> DatabaseMetaData.getTables(...)
> DatabaseMetaData.getColumns(...){code}
>
> h2. Expected behavior:
> JDBC metadata operations should be consistent with SQL name resolution. If a
> DSv2 catalog is configured and selected explicitly or through
> spark.sql.defaultCatalog/current catalog, ThriftServer should discover
> schemas, tables, and columns from that catalog.
> h2. Actual behavior:
> JDBC metadata operations only reflect objects visible through the V1
> SessionCatalog. DSv2 catalog schemas/tables may be queryable via SQL but
> missing from BI tool schema/table discovery.
> h2. Impact:
> Some BI and SQL GUI tools rely on JDBC DatabaseMetaData APIs to populate
> schema and table trees. Users can query DSv2 tables manually, but tools
> cannot discover them through ThriftServer metadata APIs.
> h2. Code pointers:
> SparkOperation currently exposes the V1 SessionCatalog:
> {code:java}
> org.apache.spark.sql.hive.thriftserver.SparkOperation
> final protected def catalog: SessionCatalog = sessionState.catalog{code}
> Metadata operations then use that V1 catalog directly:
>
> {code:java}
> org.apache.spark.sql.hive.thriftserver.SparkGetSchemasOperation
> catalog.listDatabases(...)
> {code}
> {code:java}
> org.apache.spark.sql.hive.thriftserver.SparkGetTablesOperation
> catalog.listDatabases(...)
> catalog.listTables(...)
> catalog.getTablesByName(...){code}
> h2. Possible approach:
> Make ThriftServer metadata operations DSv2-aware by using CatalogManager and
> catalog APIs where appropriate:
> * getCatalogs: use catalogManager.listCatalogs
> * getSchemas: resolve the requested catalog and use SupportsNamespaces
> * getTables: use TableCatalog.listTables
> * getColumns: use TableCatalog.loadTable(...).schema
> * preserve existing V1 SessionCatalog behavior for spark_catalog, temp
> views, and global temp views
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]