[
https://issues.apache.org/jira/browse/FLINK-36531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899192#comment-17899192
]
Sai Sharath Dandi commented on FLINK-36531:
-------------------------------------------
Hi [~mxm] , As [~heigebupahei] mentioned,
[FLIP-461|https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler]
already solves this problem from the scheduler side by synchronizing the
rescaling with checkpoint creation.
> AutoScaler needs to consider the lag from last checkpoint
> ---------------------------------------------------------
>
> Key: FLINK-36531
> URL: https://issues.apache.org/jira/browse/FLINK-36531
> Project: Flink
> Issue Type: Improvement
> Components: Autoscaler
> Reporter: Sai Sharath Dandi
> Priority: Major
>
> Autoscaler computes the target processing capacity as
> [below|https://sg.uberinternal.com/code.uber.internal/uber-code/[email protected]/-/blob/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/utils/AutoScalerUtils.java?L47]
> // Target = LAG/CATCH_UP + INPUT_RATE*RESTART/CATCH_UP +
> INPUT_RATE/TARGET_UTIL
>
> During the scaling action, the autoscaler will restart the job from the last
> successful checkpoint, we need to include the number of processed records
> since last successful checkpoint as part of the lag as those records will be
> replayed after scaling. This is particularly important for jobs with long
> checkpoint intervals and large state as there could be a significant
> difference between the realtime lag and the lag from the checkpoint
--
This message was sent by Atlassian Jira
(v8.20.10#820010)