[ https://issues.apache.org/jira/browse/SPARK-23966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tathagata Das resolved SPARK-23966. ----------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 21048 [https://github.com/apache/spark/pull/21048] > Refactoring all checkpoint file writing logic in a common interface > ------------------------------------------------------------------- > > Key: SPARK-23966 > URL: https://issues.apache.org/jira/browse/SPARK-23966 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 2.3.0 > Reporter: Tathagata Das > Assignee: Tathagata Das > Priority: Major > Fix For: 3.0.0 > > > Checkpoint files (offset log files, state store files) in Structured > Streaming must be written atomically such that no partial files are generated > (would break fault-tolerance guarantees). Currently, there are 3 locations > which try to do this individually, and in some cases, incorrectly. > # HDFSOffsetMetadataLog - This uses a FileManager interface to use any > implementation of `FileSystem` or `FileContext` APIs. It preferably loads > `FileContext` implementation as FileContext of HDFS has atomic renames. > # HDFSBackedStateStore (aka in-memory state store) > ## Writing a version.delta file - This uses FileSystem APIs only to perform > a rename. This is incorrect as rename is not atomic in HDFS FileSystem > implementation. > ## Writing a snapshot file - Same as above. > Current problems: > # State Store behavior is incorrect - > # Inflexible - Some file systems provide mechanisms other than > write-to-temp-file-and-rename for writing atomically and more efficiently. > For example, with S3 you can write directly to the final file and it will be > made visible only when the entire file is written and closed correctly. Any > failure can be made to terminate the writing without making any partial files > visible in S3. The current code does not abstract out this mechanism enough > that it can be customized. > Solution: > # Introduce a common interface that all 3 cases above can use to write > checkpoint files atomically. > # This interface must provide the necessary interfaces that allow > customization of the write-and-rename mechanism. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org