Stephan Ewen created FLINK-15603:

             Summary: Show "barrier lag" in checkpoint statistics
                 Key: FLINK-15603
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Web Frontend
            Reporter: Stephan Ewen
             Fix For: 1.11.0

One of the most important metrics is missing in the checkpoint stats: "barrier 
lag", meaning the time it between when the checkpoint was triggered and when 
the barriers arrive at a task.

That time is critical to identify if a checkpoint takes too long because of 
backpressure or other contention.

You can implicitly calculate this by "end_to_end_time - sync_time - 
async_time", but it is much more obvious for users that something is up when 
this number is explicitly shown.

This message was sent by Atlassian Jira

Reply via email to