elkhand commented on issue #2033: URL: https://github.com/apache/iceberg/issues/2033#issuecomment-767056823
Thank you @kezhuw @pnowojski This is the call order of `endInput()`:  New findings: This issue occurs when you take savepoint which also terminates the job: ``` ./bin/flink stop --savepointPath /tmp/flink-savepoints $JOB_ID Suspending job "c74e13c841e468b0ce0c75ecc810ecf3" with a savepoint. Savepoint completed. Path: file:/tmp/flink-savepoints/savepoint-c74e13-8a50ac842048 ``` But if you just take savepoint, and **NOT** terminate the job, the `flink.max-committed-checkpoint-id` is set to expected value. ``` ./bin/flink savepoint \ $JOB_ID \ /tmp/flink-savepoints ``` One way to bypass this issue - One way is taking manual savepoint and then cancel the job instead of creating savepoint with job stop/terminate. For already corrupted metadata files, fixing Iceberg metadata files by overwriting `flink.max-committed-checkpoint-id` to an expected value, might be one possible (not the best fix). Any other suggestions? @pnowojski problem does not go away if I separate chaining between `IcebergStreamWriter` and `IcebergFilesCommitter`. @pnowojski is there a way to take savepoint & suspend the job, instead of terminating the job? The current behavior of this command `./bin/flink stop --savepointPath /tmp/flink-savepoints $JOB_ID` it will take savepoint, and terminate the job. If there was a way to tell `take savepoint and stop/cancel the job` (job will be started from this savepoint in future), that might be helpful here. Because **the job is a streaming job**, when we stop it, we do not want it to be terminated(or `endInput()` **not** to be called), but savepoint to be taken and the job to be stopped/canceled. Is there a way to achieve this in Flink's` 1.11` or `1.12` versions? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
