Barre opened a new pull request, #643:
URL: https://github.com/apache/arrow-rs-object-store/pull/643
Call sync_all() on written files and fsync parent directories at all
write-path boundaries (put, copy, rename, multipart complete) so that a
successful return guarantees data is durable on disk, matching the implicit
contract of cloud object stores.
# Rationale for this change
When LocalFileSystem::put (or copy/rename/multipart complete) returns Ok,
callers reasonably expect the data to be durable on disk as this is the
implicit contract of every cloud object store like S3 or GCS.
However, LocalFileSystem never called fsync/sync_all, meaning the OS was
free to keep the data in its page cache indefinitely. A crash or power loss
after a successful put could result in data loss or zero-length files.
This change adds sync_all() calls on written files and fsync on parent
directories at every write-path boundary (put_opts, copy_opts, rename_opts,
multipart complete), ensuring that when an operation returns success, both the
file contents and the directory entry pointing to them are durable on stable
storage.
# Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]