RussellSpitzer commented on a change in pull request #2303:
URL: https://github.com/apache/iceberg/pull/2303#discussion_r609772679



##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/source/RowDataRewriter.java
##########
@@ -124,16 +129,28 @@ public void open(Configuration parameters) {
     }
 
     @Override
-    public List<DataFile> map(CombinedScanTask task) throws Exception {
+    public RewriteResult map(CombinedScanTask task) throws Exception {
+      // Initialize the builder of ResultWriter.
+      RewriteResult.Builder resultBuilder = RewriteResult.builder();
+      for (FileScanTask scanTask : task.files()) {
+        resultBuilder.addDataFilesToDelete(scanTask.file());
+        resultBuilder.addDeleteFilesToDelete(scanTask.deletes());

Review comment:
       I believe the spec allows for a delete file to reference multiple data 
files, which I think means that just because a delete file is associated with a 
data file that is being rewritten, it doesn't mean that the file can removed. 
Instead you would need to check that there are no longer any live data files 
which are referenced by the delete. This probably requires getting a hold of a 
reversed delete file index.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to