[
https://issues.apache.org/jira/browse/SPARK-23966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434682#comment-16434682
]
Apache Spark commented on SPARK-23966:
--------------------------------------
User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/21048
> Refactoring all checkpoint file writing logic in a common interface
> -------------------------------------------------------------------
>
> Key: SPARK-23966
> URL: https://issues.apache.org/jira/browse/SPARK-23966
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 2.3.0
> Reporter: Tathagata Das
> Assignee: Tathagata Das
> Priority: Major
>
> Checkpoint files (offset log files, state store files) in Structured
> Streaming must be written atomically such that no partial files are generated
> (would break fault-tolerance guarantees). Currently, there are 3 locations
> which try to do this individually, and in some cases, incorrectly.
> # HDFSOffsetMetadataLog - This uses a FileManager interface to use any
> implementation of `FileSystem` or `FileContext` APIs. It preferably loads
> `FileContext` implementation as FileContext of HDFS has atomic renames.
> # HDFSBackedStateStore (aka in-memory state store)
> ## Writing a version.delta file - This uses FileSystem APIs only to perform
> a rename. This is incorrect as rename is not atomic in HDFS FileSystem
> implementation.
> ## Writing a snapshot file - Same as above.
> Current problems:
> # State Store behavior is incorrect -
> # Inflexible - Some file systems provide mechanisms other than
> write-to-temp-file-and-rename for writing atomically and more efficiently.
> For example, with S3 you can write directly to the final file and it will be
> made visible only when the entire file is written and closed correctly. Any
> failure can be made to terminate the writing without making any partial files
> visible in S3. The current code does not abstract out this mechanism enough
> that it can be customized.
> Solution:
> # Introduce a common interface that all 3 cases above can use to write
> checkpoint files atomically.
> # This interface must provide the necessary interfaces that allow
> customization of the write-and-rename mechanism.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]