[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777852#comment-13777852
 ] 

Andrew Wang commented on HADOOP-9984:
-------------------------------------

Cross-posting some of my review feedback from HADOOP-9981 that Colin plans to 
address in this JIRA instead:

{quote}
* I think we have an existing bug in the paths of the returned FileStatus. When 
going through a glob, it sets the path to the built-up path which can include 
symlinks, while for a non-glob it's using getFileStatus which has a resolved 
path. I'm pretty sure FileStatus are supposed to have a resolved path. This is 
complicated by how PathFilter still needs to compare against the complete 
built-up path; maybe we could do something like:
{code}
if (filter.accept(new Path(prefix, status.getPath().getName()))) {
{code}
* Our symlink resolution right now is inconsistent: listStatus does not resolve 
results, getFileStatus does. Shouldn't this be getFileLinkStatus? Or are we 
waiting to fix this again in HDFS-9877 when it gets recommitted? I know 
HADOOP-9972 with the new APIs is coming down the pipe, so I just wanted to 
bring this up.
* I'd like to see tests that would have caught these correctness concerns: that 
resolved paths are returned correctly (with and without a wildcard), that 
PathFilters are matching against built-up paths as expected (with and without 
wildcards), and the looping /a/b -> .. symlink case you mentioned in a comment. 
Whether it's a terminal or intermediate wildcard also matters here. There are 
unfortunately a lot of edge cases.
{quote}
                
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
> default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that 
> many existing HDFS clients would be broken by listStatus and globStatus 
> returning symlinks.  One example is applications that assume that 
> !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
> resolved paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to