[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799562#comment-13799562
 ] 

Daryn Sharp commented on HADOOP-9984:
-------------------------------------

bq. listStatus should NOT follow child symlinks. Fix all internal utilities, 
hive, pig, map reduce, yarn, etc to not use isDir() and understand that a 
directory may contain symlinks.

I do not agree.  This means symlinks are not transparent and not compatible 
with pre-2.x.  I also do not agree that any solution will/has to break existing 
apps.

Furthermore, the user will rarely if ever care that something is a symlink.  So 
requiring every user that gets a file status through any of the existing API 
methods should _not_ be burdened to check if it's a symlink, then resolve it 
before checking various criteria - this is about more than just isDir().  What 
if I'm checking file size?  Or owner/group/permissions?  I expect the results 
to be of the target, not the link.

I think the only sensible solution to ensure compatibility:
# A new filtered fs wrapper whose sole responsibility is resolving symlinks.  
FileSystem.get can automatically add the wrapper.  If the user really wants to 
see symlinks, they can call getRawFs.
# No other filesystem does symlink resolution of any kind.  I've outlined in 
other jiras how having individual filesystems resolve symlinks is fundamentally 
broken, ex. viewfs.
# The new symlink aware fs wrapper will return file statuses for symlinks that 
lazy resolve the file status ala RLFS.  The lazy resolve handles the problem of 
unresolvable symlinks, that the user wasn't going to select based on name, from 
causing exceptions.

Let's make hadoop work like every other filesystem by making symlinks be 
transparent unless the user explicitly wants to know if something is a symlink.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
> default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
> HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that 
> many existing HDFS clients would be broken by listStatus and globStatus 
> returning symlinks.  One example is applications that assume that 
> !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
> resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to