[jira] [Commented] (FLINK-34213) Consider using accumulated busy time instead of busyMsPerSecond

Maximilian Michels (Jira) Tue, 23 Jan 2024 07:21:40 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-34213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810006#comment-17810006
 ]


Maximilian Michels commented on FLINK-34213:
--------------------------------------------

If we had to query metrics per vertex, that would be too expensive, but it 
seems like that is not necessary. Here is an exemplary REST API response to the 
{{/jobs/<jobId>}} endpoint:

{noformat}
{
    "jid": "b4f918c2a0312de9fe7369a7db093e96",
    "name": "-----",
    "isStoppable": false,
    "state": "RUNNING",
    "start-time": 1705094021727,
    "end-time": -1,
    "duration": 928985186,
    "maxParallelism": 10000,
    "now": 1706023006913,
    "timestamps": {
        "SUSPENDED": 0,
        "RUNNING": 1705094036134,
        "FAILING": 0,
        "CANCELED": 0,
        "CANCELLING": 0,
        "CREATED": 1705094035034,
        "INITIALIZING": 1705094021727,
        "FAILED": 0,
        "RESTARTING": 0,
        "RECONCILING": 0,
        "FINISHED": 0
    },
    "vertices": [
        {
            "id": "db1f263dc155338dc2a9622a2e06d115",
            "name": "----",
            "maxParallelism": 10000,
            "parallelism": 18,
            "status": "RUNNING",
            "start-time": 1705094037437,
            "end-time": -1,
            "duration": 928969476,
            "tasks": {
                "CANCELED": 0,
                "DEPLOYING": 0,
                "CANCELING": 0,
                "RECONCILING": 0,
                "FINISHED": 0,
                "SCHEDULED": 0,
                "CREATED": 0,
                "INITIALIZING": 0,
                "FAILED": 0,
                "RUNNING": 18
            },
            "metrics": {
                "read-bytes": 0,
                "read-bytes-complete": true,
                "write-bytes": 2907138853415272,
                "write-bytes-complete": true,
                "read-records": 0,
                "read-records-complete": true,
                "write-records": 229589536334,
                "write-records-complete": true,
                "accumulated-backpressured-time": 1533744940,
                "accumulated-idle-time": 10026044858,
                "accumulated-busy-time": 5161601268
            }
        },
   ...........
    ]
}
{noformat}

Note the accumulated backpressure/idle time.

> Consider using accumulated busy time instead of busyMsPerSecond
> ---------------------------------------------------------------
>
>                 Key: FLINK-34213
>                 URL: https://issues.apache.org/jira/browse/FLINK-34213
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Maximilian Michels
>            Priority: Minor
>
> We might achieve much better accuracy if we used the accumulated busy time 
> metrics from Flink, instead of the momentarily collected ones.
> We would use the diff between the last accumulated and the current 
> accumulated busy time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-34213) Consider using accumulated busy time instead of busyMsPerSecond

Reply via email to