[ 
https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754695#comment-13754695
 ] 

Jason Lowe commented on HADOOP-9912:
------------------------------------

bq. If you look at readdir as an example, it does not automatically dereference 
by default. Neither does ls, unless you use the -L flag on Linux. I think 
that's the expected default behavior, showing the actual contents of the 
directory. It's possible to build a directory walking program via the current 
listStatus, it just requires dereferencing any links to see if the target is a 
directory. This appears to be what ls -R does.

Thanks for the rational, Andrew.  However I don't believe {{ls}} is a good 
example.  {{ls -l}} is symlink-aware and therefore expecting to find them.  If 
you strace it, you'll notice it's using {{getdents}}, {{lstat}}, and 
{{readlink}}.  We can't really look to POSIX for an equivalent, since 
listStatus is a combination of readdir *and* stat. The equivalent directory 
walker for POSIX calls readdir and then stat on each dir entry (not lstat, 
since it's not symlink-aware or wants to follow symlinks) to determine if each 
entry is another directory (because for POSIX, the type of directory entry is 
not included with the dirent).

If listStatus is a combination of readdir and lstat then it breaks existing 
code that is not symlink-aware and expects isDir/isDirectory to return true for 
directories and isFile() to return true for files.  Lots of code has been 
written for FileSystem, and since FileSystem did not support symlinks until 
very recently, all of that code is not symlink-aware.  To make listStatus 
expose symlinks to those callers is going to be problematic, just as it is for 
Pig here.  That's why there are symlink-aware forms of stat calls so that code 
that desires to be aware of symlinks can detect them, and older code or code 
that just wants to follow them calls the original forms.

The proposed fix handles the issue for Pig with a local filesystem, but someone 
who uses Pig against an input directory that happens to be a symlink in HDFS is 
going to have the same issue.  My apologies if I'm missing something, but the 
more I think about it, the more I'm convinced that listStatus returning 
symlinks is not correct.  It's going to break existing code since almost all of 
that code is not expecting symlinks.
                
> globStatus of a symlink to a directory does not report symlink as a directory
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-9912
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9912
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-9912-testcase.patch
>
>
> globStatus for a path that is a symlink to a directory used to report the 
> resulting FileStatus as a directory but recently this has changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to