[GitHub] [iceberg] hililiwei commented on a diff in pull request #4325: Spark:Skip corrupt files in Spark Procedures and Actions

GitBox Mon, 09 May 2022 20:45:29 -0700


hililiwei commented on code in PR #4325:
URL: https://github.com/apache/iceberg/pull/4325#discussion_r868786394



##########
spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java:
##########
@@ -759,6 +759,51 @@ public void 
testPartitionedImportFromEmptyPartitionDoesNotThrow() {
         sql("SELECT * FROM %s ORDER BY id", tableName));
   }
 
+  @Test
+  public void testSkipOnError() throws IOException {
+    createUnpartitionedFileTable("parquet");
+
+    List<Object[]> source = sql("SELECT * FROM %s ORDER BY id", 
sourceTableName);
+    Assert.assertEquals(String.format("Rows in source table did not 
match\nExpected :%s rows \nFound    :%s",
+        8, source.size()), 8, source.size());
+
+    String createIceberg =
+        "CREATE TABLE %s (id Integer, name String, dept String, subdept 
String) USING iceberg";
+
+    sql(createIceberg, tableName);
+
+    File[] expectedFiles = fileTableDir.listFiles((dir, name) -> 
!name.endsWith("crc") && !name.contains("_SUCCESS"));
+
+    Assert.assertEquals("Expected number of source files", 2, 
expectedFiles.length);
+
+    // Corrupt the second file
+    Assume.assumeTrue("Delete source file!", expectedFiles[1].delete());

Review Comment:
   Thanks for the guidance. On reflection, apart from what you mentioned, this 
part is indeed a bit redundant, and we can achieve the same effect by creating 
an empty file with the `parquet` suffix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] hililiwei commented on a diff in pull request #4325: Spark:Skip corrupt files in Spark Procedures and Actions

Reply via email to