[
https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754695#comment-13754695
]
Jason Lowe commented on HADOOP-9912:
------------------------------------
bq. If you look at readdir as an example, it does not automatically dereference
by default. Neither does ls, unless you use the -L flag on Linux. I think
that's the expected default behavior, showing the actual contents of the
directory. It's possible to build a directory walking program via the current
listStatus, it just requires dereferencing any links to see if the target is a
directory. This appears to be what ls -R does.
Thanks for the rational, Andrew. However I don't believe {{ls}} is a good
example. {{ls -l}} is symlink-aware and therefore expecting to find them. If
you strace it, you'll notice it's using {{getdents}}, {{lstat}}, and
{{readlink}}. We can't really look to POSIX for an equivalent, since
listStatus is a combination of readdir *and* stat. The equivalent directory
walker for POSIX calls readdir and then stat on each dir entry (not lstat,
since it's not symlink-aware or wants to follow symlinks) to determine if each
entry is another directory (because for POSIX, the type of directory entry is
not included with the dirent).
If listStatus is a combination of readdir and lstat then it breaks existing
code that is not symlink-aware and expects isDir/isDirectory to return true for
directories and isFile() to return true for files. Lots of code has been
written for FileSystem, and since FileSystem did not support symlinks until
very recently, all of that code is not symlink-aware. To make listStatus
expose symlinks to those callers is going to be problematic, just as it is for
Pig here. That's why there are symlink-aware forms of stat calls so that code
that desires to be aware of symlinks can detect them, and older code or code
that just wants to follow them calls the original forms.
The proposed fix handles the issue for Pig with a local filesystem, but someone
who uses Pig against an input directory that happens to be a symlink in HDFS is
going to have the same issue. My apologies if I'm missing something, but the
more I think about it, the more I'm convinced that listStatus returning
symlinks is not correct. It's going to break existing code since almost all of
that code is not expecting symlinks.
> globStatus of a symlink to a directory does not report symlink as a directory
> -----------------------------------------------------------------------------
>
> Key: HADOOP-9912
> URL: https://issues.apache.org/jira/browse/HADOOP-9912
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
> Priority: Blocker
> Attachments: HADOOP-9912-testcase.patch
>
>
> globStatus for a path that is a symlink to a directory used to report the
> resulting FileStatus as a directory but recently this has changed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira