[GitHub] [iceberg] openinx commented on a change in pull request #1404: Flink: Add flink job id to state backend for handling flink job redeployment

GitBox Mon, 31 Aug 2020 20:24:17 -0700


openinx commented on a change in pull request #1404:
URL: https://github.com/apache/iceberg/pull/1404#discussion_r480686600




##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -88,6 +88,13 @@
   private transient Table table;
   private transient long maxCommittedCheckpointId;
 
+  // There're two cases that we restore from flink checkpoints: the first case 
is restoring from snapshot created by the
+  // same flink job; another case is restoring from snapshot created by 
another different job. For the second case, we
+  // need to maintain the old flink job's id in flink state backend to find 
the max-committed-checkpoint-id when
+  // traversing iceberg table's snapshots.
+  private static final ListStateDescriptor<String> JOB_ID_DESCRIPTOR = new 
ListStateDescriptor<>(

Review comment:
       We've considered to use avro or pojo to consolidate the job-id and data 
files into a single structure. It could be considered as two separate issue: 
   1.  use avro or pojo to serialize/deserialize.  If use avro, then we need 
the detailed schema for the whole structure,  but the `DataFile`  hides its 
schema inside a non-public implementation `GenericDataFile`, that was designed 
intentionally because we don't want to expose the detail schema to upper users 
for iceberg-core.  POJO need all fields provide getter/setter, while the 
DataFile don't support setter now. 
   2. Making them into a single structure.  I think we could do but I'm not 
sure what's the benefit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on a change in pull request #1404: Flink: Add flink job id to state backend for handling flink job redeployment

Reply via email to