[
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785483#comment-13785483
]
Daryn Sharp commented on HADOOP-9984:
-------------------------------------
bq. Nothing has changed with regards to paths. They're still always returned
resolved. That's the way symlinks have been handled since they were first added
to Hadoop and this patch doesn't affect that.
Yes, but first added for FileSystem doesn't mean very long. We must have
complete compatibility for 1.x and 0.23 users to make a seamless transition.
No existing pre-2.x FileSystem code is prepared for _any_ semantic differences
in methods and path handling.
Returning the resolved path is still exposing symlinks to user code that isn't
prepared to deal with symlinks. Path filters may silently "fail" by skipping
files simply because they were referenced via symlink and don't contain an
expected prefix or substring (that was resolved away via symlink). User code
that explicitly filters paths may also fail via silent data loss.
Any form of data loss, especially silent, is not an issue to be taken lightly.
We can't rewrite all production code to be symlink aware, and we can't
recertify all production code to ensure it can handle a symlink anywhere in the
path where the target may or may not retain the original name of that path
component.
bq. The design makes this necessary, unless you want to do multiple link
resolution RPCs every time you use a path, which is not scalable.
I don't say this lightly given the throughput work I'm doing on the NN: I'd
argue that the performance hit is a necessary evil. Not just for symlink
transparency, but for correctness. Everytime I operate on a path that contains
a symlink, I want the "current" resolution of the path - not what it resolved
to a minute or an hour or a day ago...
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by
> default
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-9984
> URL: https://issues.apache.org/jira/browse/HADOOP-9984
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.1.0-beta
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Blocker
> Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch,
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch,
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch,
> HADOOP-9984.013.patch, HADOOP-9984.014.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that
> many existing HDFS clients would be broken by listStatus and globStatus
> returning symlinks. One example is applications that assume that
> !FileStatus#isFile implies that the inode is a directory. As we discussed in
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
> resolved paths.
--
This message was sent by Atlassian JIRA
(v6.1#6144)