ferenc-csaky commented on PR #25231: URL: https://github.com/apache/flink/pull/25231#issuecomment-2312059662
Thanks for the fix @gaborgsomogyi! As I tried to understand the problem we have here, I checked the history of what happened here before and why. The previous change that was made `S3RecoverableFsDataOutputStream#sync()` done in [FLINK-28513](https://issues.apache.org/jira/browse/FLINK-28513). The error msg. there states: ``` S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to create a persistent recoverable intermediate point. ``` If we check the [`persist()` implementation](https://github.com/apache/flink/blob/9bcd8f4b8f48c8d9ad05575b60779c9216ee4965/flink-filesystems/flink-s3-fs-base/src/main/java/org/apache/flink/fs/s3/common/writer/S3RecoverableFsDataOutputStream.java#L158), it will upload the local part to S3, but will not enforce closure of that local part file, neither will commit the snapshotted state. IIUC committing is crucial inside `sync()`, because it is a void method, so if we would simply call `persist()` inside, we would lose the `S3Recoverable` object it returns, and the ability to resume correctly in case of a failure. So without changing the interfaces, it is necessary to commit here and make sure we have a consistent state before we leave the `sync()` method and release the lock. Based on these conclusions, if the current solution performs good based according to the actual usage, the fix itself LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org