Akihiro Okuno created SPARK-57518:
-------------------------------------

             Summary: ThriftServer DatabaseMetaData getTables/getSchemas do not 
use DataSource V2 catalogs
                 Key: SPARK-57518
                 URL: https://issues.apache.org/jira/browse/SPARK-57518
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.1.2
            Reporter: Akihiro Okuno


Spark ThriftServer can execute SQL statements against DataSource V2 catalogs, 
but its JDBC metadata operations still use the V1 SessionCatalog directly. This 
causes inconsistent behavior for JDBC and BI clients.
h2. Steps to reproduce

1. Start Spark ThriftServer with a DSv2 catalog configured, for example:

 
{code:java}
spark.sql.catalog.mycat=<catalog implementation>
spark.sql.defaultCatalog=mycat{code}
 

2. Connect through JDBC/Beeline or a BI tool.

3. Run SQL statements such as:

 
{code:java}
SELECT * FROM db.table;
SELECT * FROM mycat.db.table;
SHOW TABLES IN mycat.db;{code}
 

These statements can resolve DSv2 catalog objects.

4. Use JDBC DatabaseMetaData APIs through the same connection, for example:

 
{code:java}
DatabaseMetaData.getSchemas(...)
DatabaseMetaData.getTables(...)
DatabaseMetaData.getColumns(...){code}
 
h2. Expected behavior:

JDBC metadata operations should be consistent with SQL name resolution. If a 
DSv2 catalog is configured and selected explicitly or through 
spark.sql.defaultCatalog/current catalog, ThriftServer should discover schemas, 
tables, and columns from that catalog.
h2. Actual behavior:

JDBC metadata operations only reflect objects visible through the V1 
SessionCatalog. DSv2 catalog schemas/tables may be queryable via SQL but 
missing from BI tool schema/table discovery.
h2. Impact:

Some BI and SQL GUI tools rely on JDBC DatabaseMetaData APIs to populate schema 
and table trees. Users can query DSv2 tables manually, but tools cannot 
discover them through ThriftServer metadata APIs.
h2. Code pointers:

SparkOperation currently exposes the V1 SessionCatalog:
{code:java}
org.apache.spark.sql.hive.thriftserver.SparkOperation
  final protected def catalog: SessionCatalog = sessionState.catalog{code}
Metadata operations then use that V1 catalog directly:

 
{code:java}
org.apache.spark.sql.hive.thriftserver.SparkGetSchemasOperation
  catalog.listDatabases(...)
{code}
{code:java}
org.apache.spark.sql.hive.thriftserver.SparkGetTablesOperation
  catalog.listDatabases(...)
  catalog.listTables(...)
  catalog.getTablesByName(...){code}
h2. Possible approach:

Make ThriftServer metadata operations DSv2-aware by using CatalogManager and 
catalog APIs where appropriate:
 * getCatalogs: use catalogManager.listCatalogs
 * getSchemas: resolve the requested catalog and use SupportsNamespaces
 * getTables: use TableCatalog.listTables
 * getColumns: use TableCatalog.loadTable(...).schema
 * preserve existing V1 SessionCatalog behavior for spark_catalog, temp views, 
and global temp views



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to