RussellSpitzer commented on a change in pull request #1784:
URL: https://github.com/apache/iceberg/pull/1784#discussion_r526584682
##########
File path:
spark3/src/test/java/org/apache/iceberg/actions/TestRemoveOrphanFilesAction3.java
##########
@@ -110,4 +112,54 @@ public void testSparkCatalogHiveTable() throws
TableAlreadyExistsException, NoSu
results.contains("file:" + location + "/data/trashfile"));
}
+ @Test
+ public void testSparkSessionCatalogHadoopTable() throws Exception {
+ spark.conf().set("spark.sql.catalog.spark_catalog",
"org.apache.iceberg.spark.SparkSessionCatalog");
+ spark.conf().set("spark.sql.catalog.spark_catalog.type", "hadoop");
+ spark.conf().set("spark.sql.catalog.spark_catalog.warehouse",
tableLocation);
+ SparkSessionCatalog cat = (SparkSessionCatalog)
spark.sessionState().catalogManager().v2SessionCatalog();
+
+ String[] database = {"default"};
+ Identifier id = Identifier.of(database, "table");
+ Map<String, String> options = Maps.newHashMap();
+ Transform[] transforms = {};
+ cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
+ SparkTable table = (SparkTable) cat.loadTable(id);
+
+ spark.sql("INSERT INTO default.table VALUES (1,1,1)");
+
+ String location = table.table().location().replaceFirst("file:", "");
+ new File(location + "/data/trashfile").createNewFile();
+
+ List<String> results = Actions.forTable(table.table()).removeOrphanFiles()
+ .olderThan(System.currentTimeMillis() + 1000).execute();
+ Assert.assertTrue("trash file should be removed",
+ results.contains("file:" + location + "/data/trashfile"));
+ }
+
+ @Test
+ public void testSparkSessionCatalogHiveTable() throws Exception {
+ spark.conf().set("spark.sql.catalog.spark_catalog",
"org.apache.iceberg.spark.SparkSessionCatalog");
+ spark.conf().set("spark.sql.catalog.spark_catalog.type", "hive");
+ SparkSessionCatalog cat = (SparkSessionCatalog)
spark.sessionState().catalogManager().v2SessionCatalog();
+
+ String[] database = {"default"};
+ Identifier id = Identifier.of(database, "sessioncattest");
+ Map<String, String> options = Maps.newHashMap();
+ Transform[] transforms = {};
+ cat.dropTable(id);
+ cat.createTable(id, SparkSchemaUtil.convert(SCHEMA), transforms, options);
+ SparkTable table = (SparkTable) cat.loadTable(id);
+
+ spark.sql("INSERT INTO default.sessioncattest VALUES (1,1,1)");
+
+ String location = table.table().location().replaceFirst("file:", "");
+ new File(location + "/data/trashfile").createNewFile();
+
+ List<String> results = Actions.forTable(table.table()).removeOrphanFiles()
+ .olderThan(System.currentTimeMillis() + 1000).execute();
Review comment:
Discussed on slack: Notes here
Main reason I didn't do this originally is we then need to break the code in
to a Spark3 and Spark2 versions. But if we could, we wouldn't have to do this
at all since we could just in the Spark 3 mode load the table by getting the
Catalog directly and loading from there, instead of trying to fall back to the
non-catalog path. Thinking we are going to go down the path of attempting to
get Spark Version specific code in here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]