openinx commented on pull request #1185:
URL: https://github.com/apache/iceberg/pull/1185#issuecomment-675887605


   For this pull request, there are several issue I need to explain here: 
   1.  @JingsongLi suggest to consider the two cases: 1.  multiple flink jobs 
are writing the same table;  2. restart with a new job and continue to write 
the same time.  For the former case,  we will need a global id, such as job id 
or application id,  to identify the max checkpoint id we've committed for a 
given job. For example, we have two flink job: job1 and job2: 
     a.  `job1` commit the iceberg table with maxCheckpointId=1 ; 
     b.  the second `job2` commit the iceberg table with maxCheckpointId=5; 
     c.  `job1` commit the iceberg table with maxCheckpointId=2; 
     d.  the `job2` start to commit again, it need to find the maxCheckpointId 
corresponding to `job2`, which is 5 now. Then it need to rewrite its 
maxCheckpointId = 6 and commit the txn. 
   
   The global id of job will also help to resolve the restart issue2, because 
we will know that the newly started job is starting from checkpoint=1. 
   
   2.  the uid of operator (from @kbendick) issue,  I read the 
[document](https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html#matching-operator-state)
 in flink. Indeed, it's used for state recover and need to be unique across 
jobs. 
   
   I've address this two issue in the new patch , also attached the unit tests. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to