[
https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807864#comment-15807864
]
Steve Loughran commented on HADOOP-11452:
-----------------------------------------
We've kind of gone round in circles on the "what features" probe, because it's
so fluid. HADOOP-9565 has discussed this. I think it's time to look at the
method again, with a list of well known strings to look for. Blobstores can add
their own "atomic-put-on-close", etc.
Now regarding a patch to say "I must have atomic", well, yes, if you declare
you want it, why not have the thing fail-fast? As it is, right now you get
non-atomic renames *and don't even know*.
w.r.t S3A, we are going to do things which relies on PUT being atomic, see
HADOOP-13786 for the full algorithm. All I was proposing was a way tor people
to say "This really, really must be atomic, so that peoples code which contain
fundamental requirements of rename semantics aren't going to get deep into
trouble on S3 or Swift (but not Azure). What gets into trouble? MRv1 and MRv2
committers, for example.
Making things public? Well, FileStatus is ubiquitous; too late to remove, And,
because it lets the underlying implementation do what it wants, is great to
work with from blobstore code as we can do lots to minimise overhead. For
example, {{FileContext.listFiles()}} implements its recursive treewalk, which
would seemingly make HADOOP-13208 impossible to support. I know FC is cleaner,
but for playing blobstore games, the simpler FS API is easier to improve,
despite its lack of consistency across impls.
So instead we have classic {{boolean rename(src, dest)}} where nobody really
knows what to do when, say, the source doesn't exist, dest is "/", etc, etc.
And we have a rename(src, dest, options), where the base implementation, the
protected one in {{FileSystem}}, is in fact broken as in "will delete your
data" broken. I consider that important to fix, even if it currently only bites
anyone using FileContext.rename(src, src, overwrite).
Now, the current patch *doesn't* do anything w.r.t renames, it opens up the
method, fixes its base rename call to not delete the source, tries to specify
what actually goes on in HFDS, pulls the error strings out of DFS & makes them
shared constants, so that the other implementations can raise exceptions with
identical methods.
Do you want to review it? I know it's not complete, it doesn't have the tests
for the corner cases I've managed to identify, but at least have a look at the
FS spec document and show me where i've misunderstood thngs.
> Revisit FileSystem.rename(path, path, options)
> ----------------------------------------------
>
> Key: HADOOP-11452
> URL: https://issues.apache.org/jira/browse/HADOOP-11452
> Project: Hadoop Common
> Issue Type: Task
> Components: fs
> Affects Versions: 2.7.3
> Reporter: Yi Liu
> Assignee: Steve Loughran
> Attachments: HADOOP-11452-001.patch, HADOOP-11452-002.patch
>
>
> Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected
> and with _deprecated_ annotation. And the default implementation is not
> atomic.
> So this method is not able to be used outside. On the other hand, HDFS has a
> good and atomic implementation. (Also an interesting thing in {{DFSClient}},
> the _deprecated_ annotations for these two methods are opposite).
> It makes sense to make public for {{rename}} with _Rename options_, since
> it's atomic for rename+overwrite, also it saves RPC calls if user desires
> rename+overwrite.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]