RussellSpitzer commented on a change in pull request #1784:
URL: https://github.com/apache/iceberg/pull/1784#discussion_r526591588
##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -128,16 +129,35 @@
return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
}
+ private static Dataset<Row> loadMetadataTableFromCatalog(SparkSession spark,
String tableName, String tableLocation,
+ MetadataTableType
type) {
+ DataFrameReader dataFrameReader = spark.read().format("iceberg");
+ if (tableName.startsWith("spark_catalog")) {
+ // Do to the design of Spark, we cannot pass multi-element namespaces to
the session catalog.
+ // We also don't know whether the Catalog is Hive or Hadoop Based so we
can't just load one way or the other.
+ // Instead we will try to load the metadata table in the hive manner
first, then fall back and try the
+ // hadoop location method if that fails
+ // TODO remove this when we have Spark workaround for multipart
identifiers in SparkSessionCatalog
+ try {
+ return dataFrameReader.load(tableName.replaceFirst("spark_catalog\\.",
"") + "." + type);
Review comment:
I don't think I follow. Spark checks
```scala
def isSessionCatalog(catalog: CatalogPlugin): Boolean = {
catalog.name().equalsIgnoreCase(CatalogManager.SESSION_CATALOG_NAME)
}
```
To decide if the catalog is the session catalog and fail the parsing. If it
does then lookup table matches this pattern
``` scala
object SessionCatalogAndIdentifier {
import
org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper
def unapply(parts: Seq[String]): Option[(CatalogPlugin, Identifier)] =
parts match {
case CatalogAndIdentifier(catalog, ident) if
CatalogV2Util.isSessionCatalog(catalog) =>
if (ident.namespace.length != 1) {
throw new AnalysisException(
s"The namespace in session catalog must have exactly one name
part: ${parts.quoted}")
}
Some(catalog, ident)
case _ => None
}
}
```
So it doesn't matter if the table is the current Catalog or not, we can
never load a table by name with more than 3 pieces if it starts with
spark-catalog.
Here we are falling back to looking into the default hive catalog, which is
all we can do without having direct access to Spark3 CatalogPlugins.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]