[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1784: Fix Resolving of SparkSession Table's Metadata Tables

GitBox Wed, 18 Nov 2020 20:39:30 -0800


RussellSpitzer commented on a change in pull request #1784:
URL: https://github.com/apache/iceberg/pull/1784#discussion_r526589475




##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -128,16 +129,35 @@
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
+  private static Dataset<Row> loadMetadataTableFromCatalog(SparkSession spark, 
String tableName, String tableLocation,
+                                                           MetadataTableType 
type) {
+    DataFrameReader dataFrameReader = spark.read().format("iceberg");
+    if (tableName.startsWith("spark_catalog")) {
+      // Do to the design of Spark, we cannot pass multi-element namespaces to 
the session catalog.
+      // We also don't know whether the Catalog is Hive or Hadoop Based so we 
can't just load one way or the other.
+      // Instead we will try to load the metadata table in the hive manner 
first, then fall back and try the
+      // hadoop location method if that fails
+      // TODO remove this when we have Spark workaround for multipart 
identifiers in SparkSessionCatalog
+      try {
+        return dataFrameReader.load(tableName.replaceFirst("spark_catalog\\.", 
"") + "." + type);
+      } catch (NoSuchTableException noSuchTableException) {
+        return dataFrameReader.load(tableLocation + "#" + type);
+      }
+    } else {
+      return spark.table(tableName + "." + type);
+    }
+  }
+
   protected static Dataset<Row> loadMetadataTable(SparkSession spark, String 
tableName, String tableLocation,
                                                   MetadataTableType type) {
-    DataFrameReader noCatalogReader = spark.read().format("iceberg");
+    DataFrameReader dataFrameReader = spark.read().format("iceberg");
     if (tableName.contains("/")) {
       // Hadoop Table or Metadata location passed, load without a catalog
-      return noCatalogReader.load(tableName + "#" + type);
+      return dataFrameReader.load(tableName + "#" + type);
     }
     // Try catalog based name based resolution
     try {
-      return spark.table(tableName + "." + type);
+      loadMetadataTableFromCatalog(spark, tableName, tableLocation, type);

Review comment:
       yep sorry, when I renamed this I forgot to add the return




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1784: Fix Resolving of SparkSession Table's Metadata Tables

Reply via email to