[
https://issues.apache.org/jira/browse/HADOOP-19251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873898#comment-17873898
]
Steve Loughran commented on HADOOP-19251:
-----------------------------------------
Rename. Joy.
Way way back I did try to export the existing protected
FileSystem.rename(source. dest, options) method as a public API -this is the
one which FileContext invokes but which defaults to being non-atomic (the
exists probes, see). What I love about this one is that it actually fails
meaningfully rather than just returning false for callers to invoke as
{code}
if (!fs.rename(src, dest)) throw new IOException("rename failed but we have no
idea why");
{code}
This is of course the good invocation; the bad one is they don't check the
result. Either way: pretty useless
HADOOP-11452 make rename/3 public
https://github.com/apache/hadoop/pull/2735
I think I got distracted by other stuff but also by some of the implications.
And the fact that even that wasn't enough for my needs.
You might also want to look at
AzureBlobFileSystem.commitSingleFileByRename(src, dest, etag) which implements
felt tolerant for rename on your by recovering from load-related failures (503
returned but rename worked so a retry fails w/ 404). It's also throws
exceptions and returned some information about whether the reign was recovered
from and how long it took, adding to the manifest committer's statistics in
_SUCCESS. Oh, and it is rate limited because it is often that renaming which
can't generate heavy load across the entire storage account and so impact other
applications.
All this convinced me that the way to do it would actually be to have a
builder-based rename the way we do for openFile()/createFile(), atomic rename
would be one of the options (along with etag/version id).
{code}
// atomic rename, src etag.
CompletableFuture<RenameOutcome> r = filesystem.initiateRename(source, dest)
.opt("fs.rename.src.etag", "abb2a")
.must("fs.rename.atomic", true)
.build()
RenameOutcome o = FutureIO.awaitFuture(r)
{code}
RenameOutcome would implement IOStatisticsSource; all failures raised as
exceptions of some kind (awaitFuture() unwraps these)
{code}
// rename may/may not be atomic. if slower, provide some progress callbacks
CompletableFuture<RenameOutcome> r = filesystem.initiateRename(source, dest)
.withStatus(sourceFileStatus) // version id/etag can be picked up here.
source path doesn't need to match status.path; source is normative.
.withProgress(callback)
.build()
// here we may have a slower rename which could be cancelled, chained with
other operations
r.cancel()
{code}
See? This is exactly what we would want for object storage. Options to specify
constraints, file status to skip a head request and a synchronous completion
with intermediate progress callbacks.
But it gets really complicated really fast –and it will become a commitment to
get right and maintain. I'm not saying that isn't the right action to do -just
that it was going to take too much time on something which wasn't actually
going to work properly on GCS and S3 anyway. I had more important things to do.
One thing which is a lot more tractable is to define PathCapabilities probes
for rename semantics, which filesystems can be queried for.
fs.rename.file.atomic : file rename is atomic (hdfs. fs. abfs. gcs)
fs.rename.directory.atomic : same for dirs. false for GCS
fs.rename.file.fast: O(1) performance independent of file size. false for AWS
S3; true for most others
fs.rename.directory.fast: O(1) performance for dir rename, independent of
atomicity. false for s3 and gcs
Doing that with some contract test to probe stuff may be good. We'd do true for
all "real" filesystems, for the object stores we don't know of we have to check
an update internally or as PRs to their external repos (gcs)
Assuming you are trying to commit work through rename and want to know the
semantics match your requirements that should be enough. If you want to take
that on we can help supervise. It is low on code; understanding what the stores
do the important thing.
There is another strategy which is to use the Abortable interface which S3A
output implement to let you write to the destination, but let you back off if
you don't want to commit. Problem here: s3 doesn't have a no-overwrite flag the
way some other stores do, so you still cannot use it for an atomic write.
Meanwhile, if you are worrying about object stores, how about you take a look
at https://github.com/apache/hadoop/pull/6938 ? We have encountered this in the
wild -it looks rare enough that the fact the AWS SDK can't recover has never
been spotted by their team.
> Add Options.Rename.THROW_NON_ATOMIC
> -----------------------------------
>
> Key: HADOOP-19251
> URL: https://issues.apache.org/jira/browse/HADOOP-19251
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 3.3.6
> Reporter: Alkis Evlogimenos
> Priority: Major
>
> I propose we add an option `Options.Rename.THROW_NON_ATOMIC` to change
> `rename()` behavior to throw when the underlying filesystem's rename
> operation is not atomic.
> This would be useful for callers that expect to perform an atomic op but want
> to fail if when an atomic rename fails.
>
> At first this might seem something that can be done by querying capabilities
> of the filesystem but that would only work on real filesystems. A motivating
> example would be a virtual filesystem for which paths can resolve to any
> concrete filesystem (s3, etc). If `rename()` is called with two virtual paths
> that resolve to different filesystems (s3 and gcs for example) then obviously
> the operation can't be atomic since bytes must be copied from one fs to
> another.
>
> What do you think [~steve_l] ?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]