[
https://issues.apache.org/jira/browse/FLINK-31826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713178#comment-17713178
]
Gyula Fora commented on FLINK-31826:
------------------------------------
The same issue can also be easily reproduced with side outputs. The main
problem is that Flink does not provide target jobvertex level in/out record
metrics only “aggregated” ones.
I have a prototype fix that would add the missing metrics on the Flink side and
then the autoscaler can be improved to use that information from Flink 1.18 and
later
> Incorrect estimation of the target data rate of a vertex when only a subset
> of its upstream vertex's output is consumed
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-31826
> URL: https://issues.apache.org/jira/browse/FLINK-31826
> Project: Flink
> Issue Type: Improvement
> Components: Autoscaler
> Reporter: Zhanghao Chen
> Priority: Major
> Attachments: LHL7VKOG4B.jpg
>
>
> Currently, a vertex's target data rate = the sum of its upstream vertex's
> target data rate * input/output ratio. This assumes that all of the upstream
> vertex output goes into the current vertex. However, it does not always hold.
> Consider the following job plan generated by a Flink SQL job. The vertex in
> the middle has multiple Calc(select xx) operators chained, each connects to a
> separate downstream tasks. The total num_rec_out_rate of the middle vertex =
> SUM num_rec_in_rate of its downstream tasks.
> To fix this problem, we need operator level output metrics and edge info. The
> operator level metrics part is easy, but AFAIK, there's no way to get the
> operator level edge info from the current Flink REST APIs.
> !LHL7VKOG4B.jpg!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)