yzeng1618 opened a new issue, #10077: URL: https://github.com/apache/seatunnel/issues/10077
### Search before asking - [x] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description Currently, when using the Flink engine in SeaTunnel, large-volume jobs may fail during execution, but there is no built-in way to resume from where the job stopped. After a failure, we have to restart the job from the beginning to continue the synchronization. This is very time-consuming for big datasets and may lead to repeated processing or data inconsistency. I hope SeaTunnel’s Flink engine can support a checkpoint-based fault-tolerance mechanism (using Flink checkpoints/savepoints) so that failed jobs can be restarted from the last successful checkpoint instead of starting over. ### Usage Scenario 1. Initial full data load / large backfill We use SeaTunnel with the Flink engine to do a full data synchronization from a large source table (or multiple tables) to a data warehouse / OLAP system. A single job may run for many hours. If the job fails near the end (for example due to network issues, cluster problems, or sink timeouts), we currently have to restart from the beginning. This wastes a lot of time and cluster resources, and may also put extra pressure on the source system. With checkpoint support, the job could resume from the last successful checkpoint instead of re-reading all historical data. 2、Long-running stateful transformations In some pipelines we use Flink’s windowing, aggregation, or join operations, which maintain large state. Without proper checkpointing, any failure will cause all in-memory state to be lost and all results to be recomputed. By configuring checkpoints for these stateful SeaTunnel–Flink jobs, we can persist the state periodically, shorten recovery time after failures, and improve the reliability of long-running big-data tasks. ### Related issues no ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
