[ 
https://issues.apache.org/jira/browse/SPARK-23966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434682#comment-16434682
 ] 

Apache Spark commented on SPARK-23966:
--------------------------------------

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/21048

> Refactoring all checkpoint file writing logic in a common interface
> -------------------------------------------------------------------
>
>                 Key: SPARK-23966
>                 URL: https://issues.apache.org/jira/browse/SPARK-23966
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.3.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Major
>
> Checkpoint files (offset log files, state store files) in Structured 
> Streaming must be written atomically such that no partial files are generated 
> (would break fault-tolerance guarantees). Currently, there are 3 locations 
> which try to do this individually, and in some cases, incorrectly.
>  # HDFSOffsetMetadataLog - This uses a FileManager interface to use any 
> implementation of `FileSystem` or `FileContext` APIs. It preferably loads 
> `FileContext` implementation as FileContext of HDFS has atomic renames.
>  # HDFSBackedStateStore (aka in-memory state store)
>  ## Writing a version.delta file - This uses FileSystem APIs only to perform 
> a rename. This is incorrect as rename is not atomic in HDFS FileSystem 
> implementation.
>  ## Writing a snapshot file - Same as above.
> Current problems:
>  # State Store behavior is incorrect - 
>  # Inflexible - Some file systems provide mechanisms other than 
> write-to-temp-file-and-rename for writing atomically and more efficiently. 
> For example, with S3 you can write directly to the final file and it will be 
> made visible only when the entire file is written and closed correctly. Any 
> failure can be made to terminate the writing without making any partial files 
> visible in S3. The current code does not abstract out this mechanism enough 
> that it can be customized. 
> Solution:
>  # Introduce a common interface that all 3 cases above can use to write 
> checkpoint files atomically. 
>  # This interface must provide the necessary interfaces that allow 
> customization of the write-and-rename mechanism.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to