[ https://issues.apache.org/jira/browse/FLINK-31827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715078#comment-17715078 ]
Gyula Fora commented on FLINK-31827: ------------------------------------ The first fix on the operator side is merged for this: ff4bbd2a7bbed4ba0b1443d53731c883a230b6d4 This covers most cases without additional Flink metrics. Will folow up with the optional flink metrics. > Incorrect estimation of the target data rate of a vertex when only a subset > of its upstream vertex's output is consumed > ----------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-31827 > URL: https://issues.apache.org/jira/browse/FLINK-31827 > Project: Flink > Issue Type: Bug > Components: Autoscaler > Reporter: Zhanghao Chen > Assignee: Gyula Fora > Priority: Major > Labels: pull-request-available > Attachments: image-2023-04-17-23-37-35-280.png > > > Currently, the target data rate of a vertex = SUM(target data rate * > input/output ratio) for all of its upstream vertices. This assumes that all > output records of an upstream vertex is consumed by the downstream vertex. > However, it does not always hold. Consider the following job plan generated > by a Flink SQL job. The middle vertex contains multiple chained Calc(select > xx) operators, each connecting to a separate downstream sink tasks. As a > result, each sink task only consumes a sub-portion of the middle vertex's > output. > To fix it, we need operator level edge info to infer the upstream-downstream > relationship as well as operator level output metrics. The metrics part is > easy but AFAIK, there's no way to get the operator level edge info from the > Flink REST API yet. > !image-2023-04-17-23-37-35-280.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)