yzeng1618 opened a new issue, #10077:
URL: https://github.com/apache/seatunnel/issues/10077

   ### Search before asking
   
   - [x] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   Currently, when using the Flink engine in SeaTunnel, large-volume jobs may 
fail during execution, but there is no built-in way to resume from where the 
job stopped. After a failure, we have to restart the job from the beginning to 
continue the synchronization. This is very time-consuming for big datasets and 
may lead to repeated processing or data inconsistency.
   
   I hope SeaTunnel’s Flink engine can support a checkpoint-based 
fault-tolerance mechanism (using Flink checkpoints/savepoints) so that failed 
jobs can be restarted from the last successful checkpoint instead of starting 
over.
   
   ### Usage Scenario
   
   1. Initial full data load / large backfill
   We use SeaTunnel with the Flink engine to do a full data synchronization 
from a large source table (or multiple tables) to a data warehouse / OLAP 
system. A single job may run for many hours.
   If the job fails near the end (for example due to network issues, cluster 
problems, or sink timeouts), we currently have to restart from the beginning. 
This wastes a lot of time and cluster resources, and may also put extra 
pressure on the source system. With checkpoint support, the job could resume 
from the last successful checkpoint instead of re-reading all historical data.
   
   2、Long-running stateful transformations
   In some pipelines we use Flink’s windowing, aggregation, or join operations, 
which maintain large state. Without proper checkpointing, any failure will 
cause all in-memory state to be lost and all results to be recomputed.
   By configuring checkpoints for these stateful SeaTunnel–Flink jobs, we can 
persist the state periodically, shorten recovery time after failures, and 
improve the reliability of long-running big-data tasks.
   
   ### Related issues
   
   no
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to