[
https://issues.apache.org/jira/browse/FLINK-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927134#comment-16927134
]
vinoyang commented on FLINK-14035:
----------------------------------
[~klion26] Sounds reasonable. +1
> Introduce/Change some log for snapshot to better analysis checkpoint problem
> ----------------------------------------------------------------------------
>
> Key: FLINK-14035
> URL: https://issues.apache.org/jira/browse/FLINK-14035
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.10.0
> Reporter: Congxian Qiu(klion26)
> Priority: Major
>
> Currently, the information for checkpoint are mostly debug log (especially on
> TM side). If we want to track where the checkpoint steps and consume time
> during each step when we have a failed checkpoint or the checkpoint time is
> too long, we need to restart the job with enabling debug log, this issue
> wants to improve this situation, wants to change some exist debug log from
> debug to info, and add some more debug log. we have changed this log level
> in our production in Alibaba, and it seems no problem until now.
>
> Detail
> {{change the log below from debug level to info}}
> * log about \{{Starting checkpoint xxx }} in TM side
> * log about Sync complete in TM side
> * log about async compete in TM side
> Add debug log
> * log about receiving the barrier for exactly once mode - align from at
> lease once mode
>
> If this issue is valid, then I'm happy to contribute it.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)