[jira] [Commented] (FLINK-31826) Incorrect estimation of the target data rate of a vertex when only a subset of its upstream vertex's output is consumed

Gyula Fora (Jira) Mon, 17 Apr 2023 09:02:45 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713178#comment-17713178
 ]


Gyula Fora commented on FLINK-31826:
------------------------------------

The same issue can also be easily reproduced with side outputs. The main 
problem is that Flink does not provide target jobvertex level in/out record 
metrics only “aggregated” ones. 



I have a prototype fix that would add the missing metrics on the Flink side and 
then the autoscaler can be improved to use that information from Flink 1.18 and 
later

> Incorrect estimation of the target data rate of a vertex when only a subset 
> of its upstream vertex's output is consumed
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-31826
>                 URL: https://issues.apache.org/jira/browse/FLINK-31826
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler
>            Reporter: Zhanghao Chen
>            Priority: Major
>         Attachments: LHL7VKOG4B.jpg
>
>
> Currently, a vertex's target data rate = the sum of its upstream vertex's 
> target data rate * input/output ratio. This assumes that all of the upstream 
> vertex output goes into the current vertex. However, it does not always hold. 
> Consider the following job plan generated by a Flink SQL job. The vertex in 
> the middle has multiple Calc(select xx) operators chained, each connects to a 
> separate downstream tasks. The total num_rec_out_rate of the middle vertex = 
> SUM num_rec_in_rate of its downstream tasks.
> To fix this problem, we need operator level output metrics and edge info. The 
> operator level metrics part is easy, but AFAIK, there's no way to get the 
> operator level edge info from the current Flink REST APIs.
> !LHL7VKOG4B.jpg!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31826) Incorrect estimation of the target data rate of a vertex when only a subset of its upstream vertex's output is consumed

Reply via email to