rahil-c commented on code in PR #1862: URL: https://github.com/apache/polaris/pull/1862#discussion_r2232267318
########## plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/PolarisCatalogUtils.java: ########## @@ -91,6 +109,85 @@ public static Table loadSparkTable(GenericTable genericTable) { provider, new CaseInsensitiveStringMap(tableProperties), scala.Option.empty()); } + /** + * Extract catalog name from Spark session configuration. Looks for configuration like: + * spark.sql.catalog.<CATALOG_NAME>=org.apache.polaris.spark.SparkCatalog + */ + private static String getCatalogName() { + SparkSession spark = SparkSession.active(); + String catalogPrefix = "spark.sql.catalog."; + String polarisSparkCatalog = "org.apache.polaris.spark.SparkCatalog"; + + scala.collection.Iterator<scala.Tuple2<String, String>> configIterator = + spark.conf().getAll().iterator(); + while (configIterator.hasNext()) { + scala.Tuple2<String, String> config = configIterator.next(); + String key = config._1(); + String value = config._2(); + + if (key.startsWith(catalogPrefix) && polarisSparkCatalog.equals(value)) { + return key.substring(catalogPrefix.length()); + } + } + + throw new IllegalStateException( + "Could not obtain Polaris catalog identifier." + + "Expected following configuration to be set in session: spark.sql.catalog.<CATALOG_NAME>=org.apache.polaris.spark.SparkCatalog"); + } + + public static Table loadV1SparkHudiTable(GenericTable genericTable, Identifier identifier) { + Map<String, String> tableProperties = Maps.newHashMap(); + tableProperties.putAll(genericTable.getProperties()); + tableProperties.put( + TABLE_PATH_KEY, genericTable.getProperties().get(TableCatalog.PROP_LOCATION)); + + // Need full identifier in order to construct CatalogTable correctly for Hudi + String namespacePath = String.join(".", identifier.namespace()); + TableIdentifier tableIdentifier = + new TableIdentifier( + identifier.name(), Option.apply(namespacePath), Option.apply(getCatalogName())); Review Comment: Hudi is expecting the Spark `V1Table` which will returned by Polaris to contain this catalog Identifier. This is because Hudi is currently using Spark DSV1 it will invoke the following Spark `Rule`, `FindDataSourceTable` which expects this catalog identifier `table.identifier.catalog.get`. If the catalog does not return this catalog identifier, then during `.get` it will hit a NPE on spark side and error out. Therefore we will need this in order for the integration to work. Class is `DataSourceStrategy.scala` in spark code base <img width="1359" height="251" alt="Screenshot 2025-07-25 at 4 57 12 PM" src="https://github.com/user-attachments/assets/3d586858-2696-434f-b751-158de506a8af" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org