[GitHub] [iceberg] rdblue commented on a change in pull request #1784: Fix Resolving of SparkSession Table's Metadata Tables

GitBox Wed, 18 Nov 2020 16:10:26 -0800


rdblue commented on a change in pull request #1784:
URL: https://github.com/apache/iceberg/pull/1784#discussion_r526506179




##########
File path: spark/src/main/java/org/apache/iceberg/actions/BaseSparkAction.java
##########
@@ -128,16 +129,35 @@
     return manifestDF.union(otherMetadataFileDF).union(manifestListDF);
   }
 
+  private static Dataset<Row> loadMetadataTableFromCatalog(SparkSession spark, 
String tableName, String tableLocation,
+                                                           MetadataTableType 
type) {
+    DataFrameReader dataFrameReader = spark.read().format("iceberg");
+    if (tableName.startsWith("spark_catalog")) {
+      // Do to the design of Spark, we cannot pass multi-element namespaces to 
the session catalog.
+      // We also don't know whether the Catalog is Hive or Hadoop Based so we 
can't just load one way or the other.
+      // Instead we will try to load the metadata table in the hive manner 
first, then fall back and try the
+      // hadoop location method if that fails
+      // TODO remove this when we have Spark workaround for multipart 
identifiers in SparkSessionCatalog
+      try {
+        return dataFrameReader.load(tableName.replaceFirst("spark_catalog\\.", 
"") + "." + type);

Review comment:
       If we know that the catalog is `spark_catalog`, then we should just try 
to load without removing the catalog name. If we remove the catalog name, then 
we don't know that the right table will be loaded because the Spark catalog may 
not be the session's current catalog.
   
   And, if the metadata table type works then so would using the prefix 
`spark_catalog`. Names like `spark_catalog.db.table` work, it is just 
`spark_catalog.db.table.meta` that does not. If `meta` is added and the current 
catalog is `spark_catalog`, then I think it will fail no matter what.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1784: Fix Resolving of SparkSession Table's Metadata Tables

Reply via email to