[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1784: Fix Resolving of SparkSession Table's Metadata Tables

GitBox Wed, 18 Nov 2020 20:20:56 -0800


RussellSpitzer commented on a change in pull request #1784:
URL: https://github.com/apache/iceberg/pull/1784#discussion_r526584682




##########
File path: 
spark3/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction3.java
##########
@@ -110,4 +112,54 @@ public void testSparkCatalogHiveTable() throws 
TableAlreadyExistsException, NoSu
         results.contains("file:" + location + "/data/trashfile"));
   }
 
+  @Test
+  public void testSparkSessionCatalogHadoopTable() throws Exception {
+    spark.conf().set("spark.sql.catalog.spark_catalog", 
"org.apache.iceberg.spark.SparkSessionCatalog");
+    spark.conf().set("spark.sql.catalog.spark_catalog.type", "hadoop");
+    spark.conf().set("spark.sql.catalog.spark_catalog.warehouse", 
tableLocation);
+    SparkSessionCatalog cat = (SparkSessionCatalog) 
spark.sessionState().catalogManager().v2SessionCatalog();
+
+    String[] database = {"default"};
+    Identifier id = Identifier.of(database, "table");
+    Map<String, String> options = Maps.newHashMap();
+    Transform[] transforms = {};
+    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
+    SparkTable table = (SparkTable) cat.loadTable(id);
+
+    spark.sql("INSERT INTO default.table VALUES (1,1,1)");
+
+    String location = table.table().location().replaceFirst("file:", "");
+    new File(location + "/data/trashfile").createNewFile();
+
+    List<String> results = Actions.forTable(table.table()).removeOrphanFiles()
+        .olderThan(System.currentTimeMillis() + 1000).execute();
+    Assert.assertTrue("trash file should be removed",
+        results.contains("file:" + location + "/data/trashfile"));
+  }
+
+  @Test
+  public void testSparkSessionCatalogHiveTable() throws Exception {
+    spark.conf().set("spark.sql.catalog.spark_catalog", 
"org.apache.iceberg.spark.SparkSessionCatalog");
+    spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive");
+    SparkSessionCatalog cat = (SparkSessionCatalog) 
spark.sessionState().catalogManager().v2SessionCatalog();
+
+    String[] database = {"default"};
+    Identifier id = Identifier.of(database, "sessioncattest");
+    Map<String, String> options = Maps.newHashMap();
+    Transform[] transforms = {};
+    cat.dropTable(id);
+    cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
+    SparkTable table = (SparkTable) cat.loadTable(id);
+
+    spark.sql("INSERT INTO default.sessioncattest VALUES (1,1,1)");
+
+    String location = table.table().location().replaceFirst("file:", "");
+    new File(location + "/data/trashfile").createNewFile();
+
+    List<String> results = Actions.forTable(table.table()).removeOrphanFiles()
+        .olderThan(System.currentTimeMillis() + 1000).execute();

Review comment:
       Discussed on slack: Notes here
   
   Main reason I didn't do this originally is we then need to break the code in 
to a Spark3 and Spark2 versions. But if we could, we wouldn't have to do this 
at all since we could just in the Spark 3 mode load the table by getting the 
Catalog directly and loading from there, instead of trying to fall back to the 
non-catalog path. Thinking we are going to go down the path of attempting to 
get Spark Version specific code in here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1784: Fix Resolving of SparkSession Table's Metadata Tables

Reply via email to