Steve Loughran commented on SPARK-23966:

w.r.t FileContext.rename vs FileSystem.rename(), they are both *meant* to be 
atomic, and they are on: HDFS, local, posix-compliant DFS's. Whether any object 
store implements atomic and/or O(1) rename is always ambigiuous: depends on the 
store and even the path under the store (e.g wasb & locking-based exclusivity).

I would embrace FileContext for its better failure reporting of rename 
problems, but don't expect anything better atomically.  For object stores, the 
strategy of "write in place" is better. Of course, now you are left with the 
problem of "when to know what to use". This plugin mech handles that, and when 
some variant of HADOOP-9565 gets in, there'll be a probe for the semantics of 
an FS path which could be use by some adaptive connector.

> Refactoring all checkpoint file writing logic in a common interface
> -------------------------------------------------------------------
>                 Key: SPARK-23966
>                 URL: https://issues.apache.org/jira/browse/SPARK-23966
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.3.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Major
> Checkpoint files (offset log files, state store files) in Structured 
> Streaming must be written atomically such that no partial files are generated 
> (would break fault-tolerance guarantees). Currently, there are 3 locations 
> which try to do this individually, and in some cases, incorrectly.
>  # HDFSOffsetMetadataLog - This uses a FileManager interface to use any 
> implementation of `FileSystem` or `FileContext` APIs. It preferably loads 
> `FileContext` implementation as FileContext of HDFS has atomic renames.
>  # HDFSBackedStateStore (aka in-memory state store)
>  ## Writing a version.delta file - This uses FileSystem APIs only to perform 
> a rename. This is incorrect as rename is not atomic in HDFS FileSystem 
> implementation.
>  ## Writing a snapshot file - Same as above.
> Current problems:
>  # State Store behavior is incorrect - 
>  # Inflexible - Some file systems provide mechanisms other than 
> write-to-temp-file-and-rename for writing atomically and more efficiently. 
> For example, with S3 you can write directly to the final file and it will be 
> made visible only when the entire file is written and closed correctly. Any 
> failure can be made to terminate the writing without making any partial files 
> visible in S3. The current code does not abstract out this mechanism enough 
> that it can be customized. 
> Solution:
>  # Introduce a common interface that all 3 cases above can use to write 
> checkpoint files atomically. 
>  # This interface must provide the necessary interfaces that allow 
> customization of the write-and-rename mechanism.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to