zentol opened a new pull request #18566:
URL: https://github.com/apache/flink/pull/18566
FLINK-23976 added standardized metrics for capturing how much time we spend
in each JobStatus. However, certain states in practice consist of several
stages; for example the RUNNING state also includes the deployment of tasks.
To get a better picture on where time is spent I propose to add new metrics
that capture the deployingTime based on the execution states. This will
additionally get us closer to a proper uptime metric, which ideally will be
runningTime - various stage time metrics.
A job is considered to be deploying,
for batch jobs, if no task is running and at least one task is being
deployed
for streaming jobs, if at least one task is being deployed
The semantics are different for batch/streaming jobs because they differ in
terms of how they make progress. For a streaming job all tasks need to be
deployed for checkpointing to make work. For batch jobs any deployed task
immediately starts progressing the job.
I will add documentation later once we have agreed on the semantics.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]