Hey all,

I need to make sense of this behavior.  Any help would be appreciated.

Here’s an example of a set of Flink checkpoint metrics I don’t understand.
This is the first operator in a job and as you can see the end-to-end time
for the checkpoint is long, but it’s not explained by either sync, async,
or alignment times.  I’m not sure what to make of this.  It makes me think
I don’t understand the meaning of the metrics themselves.  In my
interpretation the end-to-end time should always be, roughly, the sum of
the other components — certainly in the case of a source task such as this.

Any thoughts or clarifications anyone can provide on this?  We have many
jobs with slow checkpoints that suffer from this sort of thing with metrics
that look similar.

-Jamie

Reply via email to