Hi all, I am working on measuring the failure recovery time of Flink and I want to decompose the recovery time into different parts, say the time to detect the failure, the time to restart the job, and the time to restore the checkpointing.
Unfortunately, I cannot find any information in Flink doc to solve this, Is there any way that Flink has provided for this, otherwise, how can I solve this? Thanks a lot for your help. Regards, Juno