[GitHub] [iceberg] rdblue commented on a change in pull request #1348: Flink: Support table sink.

GitBox Wed, 02 Sep 2020 16:19:11 -0700


rdblue commented on a change in pull request #1348:
URL: https://github.com/apache/iceberg/pull/1348#discussion_r482592010




##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -164,16 +168,51 @@ private void commitUpToCheckpoint(long checkpointId) {
       pendingDataFiles.addAll(dataFiles);
     }
 
-    AppendFiles appendFiles = table.newAppend();
-    pendingDataFiles.forEach(appendFiles::appendFile);
-    appendFiles.set(MAX_COMMITTED_CHECKPOINT_ID, Long.toString(checkpointId));
-    appendFiles.set(FLINK_JOB_ID, flinkJobId);
-    appendFiles.commit();
+    if (replacePartitions) {
+      replacePartitions(pendingDataFiles, checkpointId);
+    } else {
+      append(pendingDataFiles, checkpointId);
+    }
 
     // Clear the committed data files from dataFilesPerCheckpoint.
     pendingFileMap.clear();
   }
 
+  private void replacePartitions(List<DataFile> dataFiles, long checkpointId) {
+    ReplacePartitions dynamicOverwrite = table.newReplacePartitions();

Review comment:
       I just want to note that we don't encourage the use of 
`ReplacePartitions` because the data it deletes is implicit. It is better to 
specify what data should be overwritten, like in the new API for Spark:
   
   ```scala
   df.writeTo("iceberg.db.table").overwrite($"date" === "2020-09-01")
   ```
   
   If Flink's semantics are to replace partitions for overwrite, then it should 
be okay. But I highly recommend being more explicit about data replacement.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1348: Flink: Support table sink.

Reply via email to