Thanks for driving this effort, xiangyu!
The proposal overall LGTM.
I just have a small question. There are other places in Flink that interact
with external storage. Should we consider adding a general retry mechanism
to them?

xiangyu feng <xiangyu...@gmail.com> 于2024年1月8日周一 11:31写道:

> Hi devs,
>
> I'm opening this thread to discuss FLIP-414: Support Retry Mechanism in
> RocksDBStateDataTransfer[1].
>
> Currently, there is no retry mechanism for downloading and uploading
> RocksDB state files. Any jittering of remote filesystem might lead to a
> checkpoint failure. By supporting retry mechanism in
> `RocksDBStateDataTransfer`, we can significantly reduce the failure rate of
> checkpoint during asynchronous phrase.
>
> To make this retry mechanism configurable, we have introduced two options
> in this FLIP: `state.backend.rocksdb.checkpoint.transfer.retry.times` and `
> state.backend.rocksdb.checkpoint.transfer.retry.interval`. The default
> behavior remains to be no retry will be performed in order to be consistent
> with the original behavior.
>
> Looking forward to your feedback, thanks.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-414%3A+Support+Retry+Mechanism+in+RocksDBStateDataTransfer
>
> Best regards,
> Xiangyu Feng
>


-- 
Best,
Yue

Reply via email to