[
https://issues.apache.org/jira/browse/FLINK-25470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525926#comment-17525926
]
Roman Khachatryan commented on FLINK-25470:
-------------------------------------------
Thanks for the analysis [~masteryhx] and sorry for the late reply.
> According to these metics, we could roughly infer:
> 1. restore time by full size of materialization part and non-materialization
> part
For that, we need to collect metrics for the whole checkpoint, right? (which is
non-trivial)
Or do you propose to expose on subtask level, gather via reporters, and the
correlate metrics from different tasks by time?
> 2. when a checkpoint includes a new Materialization by incremetal/full size
> of materialization part.
This shouldn't change much after FLINK-26306
> 3. the cleanup efficiency of non-materialization part by compare the full
> size of non-materialization part which is the real size and the actual size
> in the dfs.
I think it's better to explicitly expose cleanup-related metrics
> Add/Expose/Differentiate metrics of checkpoint size between changelog size vs
> materialization size
> --------------------------------------------------------------------------------------------------
>
> Key: FLINK-25470
> URL: https://issues.apache.org/jira/browse/FLINK-25470
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Metrics, Runtime / State Backends
> Reporter: Yuan Mei
> Assignee: Hangxiang Yu
> Priority: Major
> Fix For: 1.16.0
>
> Attachments: Screen Shot 2021-12-29 at 1.09.48 PM.png
>
>
> FLINK-25557 only resolves part of the problems.
> Eventually, we should answer questions:
> * How much Data Size increases/exploding
> * When a checkpoint includes a new Materialization
> * Materialization size
> * changelog sizes from the last complete checkpoint (that can roughly infer
> restore time)
>
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)