[
https://issues.apache.org/jira/browse/FLINK-37375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jufang He updated FLINK-37375:
------------------------------
Summary: Checkpoint supports the Operator to customize asynchronous
operation (was: Checkpoint supports the Operator to customize asynchronous
snapshot state)
> Checkpoint supports the Operator to customize asynchronous operation
> --------------------------------------------------------------------
>
> Key: FLINK-37375
> URL: https://issues.apache.org/jira/browse/FLINK-37375
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing
> Affects Versions: 1.20.1
> Reporter: Jufang He
> Priority: Major
> Labels: pull-request-available
>
> In some Flink task operators, slow operations such as file uploads or data
> flushing may be performed during the synchronous phase of Checkpoint. Due to
> performance issues with external storage components, the synchronous phase
> may take too long to execute, significantly impacting the job's throughput.
> For example, during our internal use of Paimon, we observed that uploading
> files to HDFS during the Checkpoint synchronous phase could encounter random
> HDFS slow node issues, leading to a substantial negative impact on task
> throughput.
> To address this issue, I propose supporting a generic operator custom
> asynchronous snapshot feature, allowing users to move time-consuming logic to
> the asynchronous phase of Checkpoint, thereby minimizing the blocking of the
> main thread and improving task throughput. For instance, the Paimon writer
> operator could write data locally during the Checkpoint synchronous phase and
> upload files to remote storage during the asynchronous phase. Beyond the
> Paimon data upload scenario, other operator logic may also experience slow
> execution during the synchronous phase. This approach aims to uniformly
> optimize such issues.
> I drafted a flip for this issue:
> [https://docs.google.com/document/d/1lwxLEQjD6jVhZUBMRGhzQNWKSvdbPbYNQsV265gR4kw]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)