elkhand commented on issue #2033:
URL: https://github.com/apache/iceberg/issues/2033#issuecomment-767056823


   Thank you @kezhuw @pnowojski 
   
   This is the call order of `endInput()`:
   
![image](https://user-images.githubusercontent.com/4366998/105750587-c9c60e80-5ef9-11eb-9d27-0b54e0f10b80.png)
   
   New findings:
   This issue occurs when you take savepoint which also terminates the job:
   ```
   ./bin/flink stop --savepointPath /tmp/flink-savepoints $JOB_ID
   
   Suspending job "c74e13c841e468b0ce0c75ecc810ecf3" with a savepoint.
   Savepoint completed. Path: 
file:/tmp/flink-savepoints/savepoint-c74e13-8a50ac842048
   ```
   But if you just take savepoint, and **NOT** terminate the job, the 
`flink.max-committed-checkpoint-id` is set to expected value.
   
   ```
   ./bin/flink savepoint \
   $JOB_ID \
   /tmp/flink-savepoints
   ```
   
   One way to bypass this issue 
   - One way is taking manual savepoint and then cancel the job instead of 
creating savepoint with job stop/terminate.
   
   For already corrupted metadata files, fixing Iceberg metadata files by 
overwriting `flink.max-committed-checkpoint-id` to an expected value, might be 
one possible (not the best fix).
   
   Any other suggestions?
   
   @pnowojski problem does not go away if I separate chaining between 
`IcebergStreamWriter` and `IcebergFilesCommitter`.
   
   @pnowojski  is there a way to take savepoint & suspend the job, instead of 
terminating the job?
   
   The current behavior of this command `./bin/flink stop --savepointPath 
/tmp/flink-savepoints $JOB_ID` it will take savepoint, and terminate the job. 
   
   If there was a way to tell `take savepoint and stop/cancel the job` (job 
will be started from this savepoint in future), that might be helpful here. 
Because **the job is a streaming job**, when we stop it, we do not want it to 
be terminated(or `endInput()` **not** to be called), but savepoint to be taken 
and the job to be stopped/canceled.
   
   Is there a way to achieve this in Flink's` 1.11` or `1.12` versions?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to