[GitHub] [iceberg] openinx commented on pull request #1185: Flink: Add the iceberg files committer to collect data files and commit to iceberg table.

GitBox Tue, 18 Aug 2020 23:52:01 -0700


openinx commented on pull request #1185:
URL: https://github.com/apache/iceberg/pull/1185#issuecomment-675887605

For this pull request, there are several issue I need to explain here:
1. @JingsongLi suggest to consider the two cases: 1. multiple flink jobs
are writing the same table; 2. restart with a new job and continue to write
the same time. For the former case, we will need a global id, such as job id
or application id, to identify the max checkpoint id we've committed for a
given job. For example, we have two flink job: job1 and job2:
a. `job1` commit the iceberg table with maxCheckpointId=1 ;
b. the second `job2` commit the iceberg table with maxCheckpointId=5;
c. `job1` commit the iceberg table with maxCheckpointId=2;
d. the `job2` start to commit again, it need to find the maxCheckpointId
corresponding to `job2`, which is 5 now. Then it need to rewrite its
maxCheckpointId = 6 and commit the txn.

The global id of job will also help to resolve the restart issue2, because
we will know that the newly started job is starting from checkpoint=1.

2. the uid of operator (from @kbendick) issue, I read the
[document](https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html#matching-operator-state)
in flink. Indeed, it's used for state recover and need to be unique across
jobs.

I've address this two issue in the new patch , also attached the unit tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on pull request #1185: Flink: Add the iceberg files committer to collect data files and commit to iceberg table.

Reply via email to