[
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779369#comment-13779369
]
Colin Patrick McCabe commented on HADOOP-9984:
----------------------------------------------
This patch changes {{listStatus}} and {{globStatus}} to resolve symlinks.
If a symlink can't be resolved when doing a {{listStatus}}, a
{{DirectoryContentsResolutionException}} is thrown which contains the
resolution exception. This will usually be {{FileNotFoundException}}, but it
doesn't have to be. It could also be some other error that occurred when
trying to do the RPC. globber ignores missing files, just as it does now. The
implementation also makes this necessary, since the globber catches and
discards {{FileNotFoundException}}, and dangling symlinks always manifest as
{{FileNotFoundException}}.
I added a new API, {{listLinkStatus}}, which is like {{listStatus}}, but does
not resolve symlinks. {{listLinkStatus}} is necessary here, since
{{globStatus}} needs to glob on file name, not target name (and this patch
changes {{listStatus}} to resolve links, as previously mentioned.) Filesystems
which don't (yet) support symlinks map {{listLinkStatus}} to {{listStatus}},
similarly to how we handle {{getFileLinkStatus}}.
In Globber, I combined {{authorityFromPath}} and {{schemeFromPath}} into a
single function, {{uriToSchemeAndAuthority}}. This was necessary since in
cases where accept the scheme of the passed-in path, we also should accept its
authority. So, for example, when processing {{file:///tmp/*}}, we want the
scheme to show up as "file" and the authority to be null. Previously, we were
getting the scheme as file, but the authority as the default authority,
something like "{{username@host}}".
I fixed all the symlink-related unit tests in {{TestGlobPaths}} and added some
more. I added a test of listStatus' behavior with dangling links to
{{SymlinkBaseTest}}.
Path filters currently match on resolved path, both in {{globStatus}} and
{{listStatus}}. The rationale is:
* When a filesystem goes from not supporting symlinks to supporting symlinks,
we don't want existing code to break. If we always apply the path filter on
resolved path, the behavior visible to code will be the same whether or not the
filesystem is aware of symlinks or not.
* globbing on resolved path will make possible certain optimizations in the
globber when {{resolveLinks=true}}.
* it seems more intuitive filter on the path which you're actually returning.
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by
> default
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-9984
> URL: https://issues.apache.org/jira/browse/HADOOP-9984
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.1.0-beta
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Blocker
> Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch,
> HADOOP-9984.005.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that
> many existing HDFS clients would be broken by listStatus and globStatus
> returning symlinks. One example is applications that assume that
> !FileStatus#isFile implies that the inode is a directory. As we discussed in
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
> resolved paths.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira