openinx commented on a change in pull request #1404:
URL: https://github.com/apache/iceberg/pull/1404#discussion_r480686600
##########
File path:
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -88,6 +88,13 @@
private transient Table table;
private transient long maxCommittedCheckpointId;
+ // There're two cases that we restore from flink checkpoints: the first case
is restoring from snapshot created by the
+ // same flink job; another case is restoring from snapshot created by
another different job. For the second case, we
+ // need to maintain the old flink job's id in flink state backend to find
the max-committed-checkpoint-id when
+ // traversing iceberg table's snapshots.
+ private static final ListStateDescriptor<String> JOB_ID_DESCRIPTOR = new
ListStateDescriptor<>(
Review comment:
We've considered to use avro or pojo to consolidate the job-id and data
files into a single structure. It could be considered as two separate issue:
1. use avro or pojo to serialize/deserialize. If use avro, then we need
the detailed schema for the whole structure, but the `DataFile` hides its
schema inside a non-public implementation `GenericDataFile`, that was designed
intentionally because we don't want to expose the detail schema to upper users
for iceberg-core. POJO need all fields provide getter/setter, while the
DataFile don't support setter now.
2. Making them into a single structure. I think we could do but I'm not
sure what's the benefit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]