Hi all, Yes, we already had a discussion under the jira[1], here's my thought:
I'd +1 for the motivation, since there might be some long operations for operators (source or sink) which should be done before checkpoint materialization. One suggestion regarding the API design: Instead of introducing a new method `asyncOperation` only for async invoke, could we just provide a `snapshotStateAsync` which do synchronous operation and return a `RunnableFuture<Void>` or `RunnableFuture<Boolean>` for the following asynchronous part? And I think we should deprecate and eventually remove the original `snapshotState` since it is subsumed by the new method. @Piotr Nowojski <pnowoj...@apache.org> WDYT? [1] https://issues.apache.org/jira/browse/FLINK-37375 Best, Zakelly On Wed, Mar 26, 2025 at 8:40 PM jufang he <hejufang0...@gmail.com> wrote: > Hi devs, > > > I would like to start a discussion about FLIP-XXX: Checkpoint supports the > Operator to customize asynchronous operation [1]. > > > In some Flink task operators, slow operations such as file uploads or data > flushing may be performed during the synchronous phase of Checkpoint. Due > to performance issues with external storage components, the synchronous > phase may take too long to execute, significantly impacting the task's > throughput. > To address this issue, I propose supporting operator custom asynchronous > operation feature, allowing users to move time-consuming operation from the > synchronous phase to the asynchronous phase of Checkpoint, thereby > minimizing the blocking of the main thread and improving task throughput. > > > For more details, please check the FLIP [1]. There is also a Jira about > this [2]. > > > Looking forward to any comments and opinions! > > > Best Regards, > Jufang He > > [1] > https://docs.google.com/document/d/1lwxLEQjD6jVhZUBMRGhzQNWKSvdbPbYNQsV265gR4kw > > [2] https://issues.apache.org/jira/browse/FLINK-37375 >