[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1784: Fix Resolving of SparkSession Table's Metadata Tables

GitBox Wed, 18 Nov 2020 20:47:19 -0800


RussellSpitzer commented on a change in pull request #1784:
URL: https://github.com/apache/iceberg/pull/1784#discussion_r526591588




##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -128,16 +129,35 @@
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
+  private static Dataset<Row> loadMetadataTableFromCatalog(SparkSession spark, 
String tableName, String tableLocation,
+                                                           MetadataTableType 
type) {
+    DataFrameReader dataFrameReader = spark.read().format("iceberg");
+    if (tableName.startsWith("spark_catalog")) {
+      // Do to the design of Spark, we cannot pass multi-element namespaces to 
the session catalog.
+      // We also don't know whether the Catalog is Hive or Hadoop Based so we 
can't just load one way or the other.
+      // Instead we will try to load the metadata table in the hive manner 
first, then fall back and try the
+      // hadoop location method if that fails
+      // TODO remove this when we have Spark workaround for multipart 
identifiers in SparkSessionCatalog
+      try {
+        return dataFrameReader.load(tableName.replaceFirst("spark_catalog\\.", 
"") + "." + type);

Review comment:
       I don't think I follow.  Spark checks
   
   ```scala 
   def isSessionCatalog(catalog: CatalogPlugin): Boolean = {
       catalog.name().equalsIgnoreCase(CatalogManager.SESSION_CATALOG_NAME)
     }
   ```
   To decide if the catalog is the session catalog and fail the parsing. If it 
does then lookup table matches this pattern
   
   ``` scala 
   object SessionCatalogAndIdentifier {
       import 
org.apache.spark.sql.connector.catalog.CatalogV2Implicits.MultipartIdentifierHelper
   
       def unapply(parts: Seq[String]): Option[(CatalogPlugin, Identifier)] = 
parts match {
         case CatalogAndIdentifier(catalog, ident) if 
CatalogV2Util.isSessionCatalog(catalog) =>
           if (ident.namespace.length != 1) {
             throw new AnalysisException(
               s"The namespace in session catalog must have exactly one name 
part: ${parts.quoted}")
           }
           Some(catalog, ident)
         case _ => None
       }
     }
   ```
   
   So it doesn't matter if the table is the current Catalog or not, we can 
never load a table by name with more than 3 pieces if it starts with 
spark-catalog. 
   
   Here we are falling back to looking into the default hive catalog, which is 
all we can do without having direct access to Spark3 CatalogPlugins. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1784: Fix Resolving of SparkSession Table's Metadata Tables

Reply via email to