Akihiro Okuno created SPARK-57518:
-------------------------------------
Summary: ThriftServer DatabaseMetaData getTables/getSchemas do not
use DataSource V2 catalogs
Key: SPARK-57518
URL: https://issues.apache.org/jira/browse/SPARK-57518
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.1.2
Reporter: Akihiro Okuno
Spark ThriftServer can execute SQL statements against DataSource V2 catalogs,
but its JDBC metadata operations still use the V1 SessionCatalog directly. This
causes inconsistent behavior for JDBC and BI clients.
h2. Steps to reproduce
1. Start Spark ThriftServer with a DSv2 catalog configured, for example:
{code:java}
spark.sql.catalog.mycat=<catalog implementation>
spark.sql.defaultCatalog=mycat{code}
2. Connect through JDBC/Beeline or a BI tool.
3. Run SQL statements such as:
{code:java}
SELECT * FROM db.table;
SELECT * FROM mycat.db.table;
SHOW TABLES IN mycat.db;{code}
These statements can resolve DSv2 catalog objects.
4. Use JDBC DatabaseMetaData APIs through the same connection, for example:
{code:java}
DatabaseMetaData.getSchemas(...)
DatabaseMetaData.getTables(...)
DatabaseMetaData.getColumns(...){code}
h2. Expected behavior:
JDBC metadata operations should be consistent with SQL name resolution. If a
DSv2 catalog is configured and selected explicitly or through
spark.sql.defaultCatalog/current catalog, ThriftServer should discover schemas,
tables, and columns from that catalog.
h2. Actual behavior:
JDBC metadata operations only reflect objects visible through the V1
SessionCatalog. DSv2 catalog schemas/tables may be queryable via SQL but
missing from BI tool schema/table discovery.
h2. Impact:
Some BI and SQL GUI tools rely on JDBC DatabaseMetaData APIs to populate schema
and table trees. Users can query DSv2 tables manually, but tools cannot
discover them through ThriftServer metadata APIs.
h2. Code pointers:
SparkOperation currently exposes the V1 SessionCatalog:
{code:java}
org.apache.spark.sql.hive.thriftserver.SparkOperation
final protected def catalog: SessionCatalog = sessionState.catalog{code}
Metadata operations then use that V1 catalog directly:
{code:java}
org.apache.spark.sql.hive.thriftserver.SparkGetSchemasOperation
catalog.listDatabases(...)
{code}
{code:java}
org.apache.spark.sql.hive.thriftserver.SparkGetTablesOperation
catalog.listDatabases(...)
catalog.listTables(...)
catalog.getTablesByName(...){code}
h2. Possible approach:
Make ThriftServer metadata operations DSv2-aware by using CatalogManager and
catalog APIs where appropriate:
* getCatalogs: use catalogManager.listCatalogs
* getSchemas: resolve the requested catalog and use SupportsNamespaces
* getTables: use TableCatalog.listTables
* getColumns: use TableCatalog.loadTable(...).schema
* preserve existing V1 SessionCatalog behavior for spark_catalog, temp views,
and global temp views
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]