[ 
https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776512#comment-13776512
 ] 

Colin Patrick McCabe commented on HADOOP-9972:
----------------------------------------------

bq. I mean listStatus(Path, PathOption) should call into listLinkStatus(it is 
HDFS::listStatus which is a primitive RPC call), not the other way around. I 
wonder how can we implement listStatus(Path, PathOption) without the primitive 
of listLinkStatus(Path)?

FileSystem#listStatus(Path, PathOption) should just be an abstract function 
which is implemented by DistributedFilesystem and other implementation classes. 
 DistributedFileSystem, and the other FileSystem implementations, need to get 
access to the other things in PathOption, such as the error handler.  Also, if 
we want to add more options in the future, we don't want to create 
listLinkStatusWithFoo and listLinkStatusWithFooAndBar.  Just listStatus(Path, 
PathOption).

I understand that bash globs ignore errors.  But that's not really a good 
reason why we shouldn't.  Hadoop and HDFS exist in an environment where there 
are unreliable networks.  So if globStatus swallows unresolved symlink errors, 
you could find yourself in a situation where your cross-filesystem symlink 
fails, and you silently operate on data that isn't what you think you're 
operating on.  There are also compatibility reasons not to ignore errors-- 
errors were not ignored in branch-1.  We discussed this on HADOOP-9929.
                
> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to 
> deal with symlinks.  The issue is that code has been written which is 
> incompatible with the existence of things which are not files or directories. 
>  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} 
> and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring 
> dangling ones.  This will prevent incompatibility with existing MR jobs and 
> other HDFS users.  We should also add new versions of listStatus and 
> globStatus that allow new, symlink-aware code to deal with symlinks as 
> symlinks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to