[
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777852#comment-13777852
]
Andrew Wang commented on HADOOP-9984:
-------------------------------------
Cross-posting some of my review feedback from HADOOP-9981 that Colin plans to
address in this JIRA instead:
{quote}
* I think we have an existing bug in the paths of the returned FileStatus. When
going through a glob, it sets the path to the built-up path which can include
symlinks, while for a non-glob it's using getFileStatus which has a resolved
path. I'm pretty sure FileStatus are supposed to have a resolved path. This is
complicated by how PathFilter still needs to compare against the complete
built-up path; maybe we could do something like:
{code}
if (filter.accept(new Path(prefix, status.getPath().getName()))) {
{code}
* Our symlink resolution right now is inconsistent: listStatus does not resolve
results, getFileStatus does. Shouldn't this be getFileLinkStatus? Or are we
waiting to fix this again in HDFS-9877 when it gets recommitted? I know
HADOOP-9972 with the new APIs is coming down the pipe, so I just wanted to
bring this up.
* I'd like to see tests that would have caught these correctness concerns: that
resolved paths are returned correctly (with and without a wildcard), that
PathFilters are matching against built-up paths as expected (with and without
wildcards), and the looping /a/b -> .. symlink case you mentioned in a comment.
Whether it's a terminal or intermediate wildcard also matters here. There are
unfortunately a lot of edge cases.
{quote}
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by
> default
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-9984
> URL: https://issues.apache.org/jira/browse/HADOOP-9984
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.1.0-beta
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Blocker
> Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that
> many existing HDFS clients would be broken by listStatus and globStatus
> returning symlinks. One example is applications that assume that
> !FileStatus#isFile implies that the inode is a directory. As we discussed in
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
> resolved paths.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira