kbendick opened a new pull request #3106:
URL: https://github.com/apache/iceberg/pull/3106


   We are occasionally seeing CI runs take 6 hours, and then ultimately timeout.
   
   After adding further logging, it seems that there is a Flink test that is 
still trying to checkpoint after the job has entered the FINISHED state.
   
   I'm not 100% sure if adding this config will help with that (as it might not 
be considered a checkpoint failure), but it's worth a shot for further 
debugging. Ultimately, we should resolve this issue, but for now I just want to 
see if this will help.
   
   Further details (and logs) can be found here: 
https://github.com/apache/iceberg/issues/3091
   
   The relevant log that is spewed for hours until timeout is:
   ```
   2021-09-13T08:19:47.7896411Z > Task :iceberg-flink:test
   2021-09-13T08:19:47.7899950Z     [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 
triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink 
-> rightIcebergSink-IcebergStreamWriter (1/1) of job 
437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. 
Aborting checkpoint.
   2021-09-13T08:19:47.7905489Z     [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 
triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink 
-> rightIcebergSink-IcebergStreamWriter (1/1) of job 
437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. 
Aborting checkpoint.
   2021-09-13T08:19:47.7914766Z     [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 
triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink 
-> rightIcebergSink-IcebergStreamWriter (1/1) of job 
437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. 
Aborting checkpoint.
   2021-09-13T08:19:47.7920502Z     [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 
triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink 
-> rightIcebergSink-IcebergStreamWriter (1/1) of job 
437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. 
Aborting checkpoint.
   ```
   
   cc @nastra @openinx @rdblue @RussellSpitzer @stevenzwu in case you have any 
insight on how to resolve this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to