Hi, RewriteFiles implements a swap operation. For example, you might compact file_1 with file_2 to produce file_3, then swap the files. The correctness of this operation depends on having both file_1 and file_2 in the table, or else the compaction would un-delete or duplicated rows. That's why it validates that each file is actually removed from the dataset.
If you want an idempotent delete, then you should use the DeleteFiles API that doesn't add validation. rb On Thu, Sep 19, 2019 at 12:33 AM 李响 <[email protected]> wrote: > Dear community, > > I am trying to re-write a couple of data files in a table, like > > val fileToDelete1 = DataFiles.builder(partitionSpec) > ... > .withPath(delete_path_1) > ... > .build > val fileToDelete2 = DataFiles.builder(partitionSpec) > ... > .withPath(delete_path_2) > ... > .build > val fileToAdd = DataFiles.builder(partitionSpec) > ... > .withPath(add1) > ... > .build > > table.newRewrite() > .rewriteFiles(JavaConversions.setAsJavaSet(Set(fileToDelete1, > fileToDelete2)), > JavaConversions.setAsJavaSet(Set(fileToAdd))) > .commit() > > And it is rejected by > Exception in thread "main" > org.apache.iceberg.exceptions.ValidationException: Missing required files > to delete: delete_path_1, delete_path_2 > at > org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:42) > at > org.apache.iceberg.MergingSnapshotProducer.apply(MergingSnapshotProducer.java:275) > at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:146) > at > org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:238) > at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:403) > at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:212) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:188) > at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:237) > at Test$.main(Test.scala:68) > at Test.main(Test.scala) > > The logic in MergingSnapshotProducer (line 275 > <https://github.com/apache/incubator-iceberg/blob/433f169e9d0b10688d395abde64c4b6461d35ca9/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L275>) > makes that happen. failMissingDeletePaths is true. And > deletedFiles.containsAll(deletePaths) is false, as deleteFiles is empty. > > My questions are > (1) What does failMissingDeletePaths mean? Whether to fail if missing > delete paths happens? > (2) How to make deletedFiles not being empty (it is supposed to contain > the files to delete I believe)? > > I am reading the code but do not figure it out yet. Really appreciate it > if you could share your thoughts at your most convenience. Thanks! > -- > > 李响 Xiang Li > > 手机 cellphone :+86-136-8113-8972 > 邮件 e-mail :[email protected] > -- Ryan Blue Software Engineer Netflix
