[ 
https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753620#comment-13753620
 ] 

Daryn Sharp commented on HADOOP-9912:
-------------------------------------

bq. The intended behavior of Globber.glob (which calls listStatus) is to return 
symlink rather than symlink target I believe

bq. I guess for a long time, pig is using this behavior(listStatus return 
symlink target rather than symlink), I am afraid this behavior is wrong and is 
inconsistent with HDFS. 

Wrong. Wrong. Wrong.  {{listStatus}} resolves symlinks.  {{globStatus}} is 
supposed to be equivalent to {{listStatus}} with wildcard support.  All 
existing code depends on these semantics, and rightly so.  Symlinks should be 
transparent to users unless they specifically want to know if a path is a 
symlink.  That's why there is a counterpart to {{getFileStatus}} called 
{{getFileLinkStatus}} which does not resolve symlinks.

HADOOP-9877 fundamentally broke the semantics of {{globStatus}} based on 
whether the last path component is a glob or static.  The result is:
* /path/symlink - the static component "symlink" results in a file status of 
the symlink, breaking isFile/isDir/etc
* /path/sym*link - the glob component "symlink" returns the file status of the 
resolved link, working as expected

{{globStatus}} _must_ consistently return resolved paths.  The semantics 
altered by HADOOP-9877 will break lots of code.  I'm pretty sure that includes 
{{FsShell}}.  We cannot break lot standing semantics just for snapshots.

Why does .snapshot support require a {{getFileLinkStatus}}?  Does 
{{getFileStatus}} not work for a .snapshot directory?
                
> globStatus of a symlink to a directory does not report symlink as a directory
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-9912
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9912
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-9912-testcase.patch
>
>
> globStatus for a path that is a symlink to a directory used to report the 
> resulting FileStatus as a directory but recently this has changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to