[ 
https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757361#comment-13757361
 ] 

Eli Collins commented on HADOOP-9912:
-------------------------------------

Webex sounds good to me too.

bq. Is it unreasonable to have listStatus resolve symlinks and provide a 
separate API or flag for symlink-aware clients?

IMO listStatus is equivalent to readdir and should therefore not resolve paths 
(lists each entry as either file/dir/link). If users need an API that list the 
status' in a directory and resolves each we (or they) can write a helper 
function that does the same thing but resolves links. This would not be less 
optimal in terms of performance since links are resolved by the client, and 
it's not clear if good semantics exist (do you fail if a link fails to resolve? 
do dangling links stay links and everything else is resolved?) in which case 
it's good to not have this behavior as part of the core API.

If we change FileSystem#listStatus to resolve links then we need to change 
FileContext#listStatus as well and that has supported but not resolved links 
for several releases. And does the iterable version of listStatus resolve links 
by default now too? Clearly FileSystem has more compatibility concerns than 
FileSystem but I don't see an option where we preserve compatibility. We're 
balancing compatibility against friendly semantics (would a typical caller 
expect that they need to pass a flag to listStatus to prevent it from resolving 
links?) and while I agree we should help the transition by providing an API 
it's not clear to me it should be the default, and if we do provide a helper 
that's not the default would it be easier for frameworks like Pig to just 
update the relevant code to check the FileStatus? They'll need to do this 
anyway if they have assumptions like  HADOOP-6585 and it seems like they might 
want to do something different for links to directories than links to files in 
which case one helper might not work for everyone.

I agree with Andrew that we don't want to set the symlink bit for a non-symlink 
(resolved) FileStatus as that would definitely break/confuse some things.
                
> globStatus of a symlink to a directory does not report symlink as a directory
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-9912
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9912
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-9912-testcase.patch
>
>
> globStatus for a path that is a symlink to a directory used to report the 
> resulting FileStatus as a directory but recently this has changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to