[
https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757361#comment-13757361
]
Eli Collins commented on HADOOP-9912:
-------------------------------------
Webex sounds good to me too.
bq. Is it unreasonable to have listStatus resolve symlinks and provide a
separate API or flag for symlink-aware clients?
IMO listStatus is equivalent to readdir and should therefore not resolve paths
(lists each entry as either file/dir/link). If users need an API that list the
status' in a directory and resolves each we (or they) can write a helper
function that does the same thing but resolves links. This would not be less
optimal in terms of performance since links are resolved by the client, and
it's not clear if good semantics exist (do you fail if a link fails to resolve?
do dangling links stay links and everything else is resolved?) in which case
it's good to not have this behavior as part of the core API.
If we change FileSystem#listStatus to resolve links then we need to change
FileContext#listStatus as well and that has supported but not resolved links
for several releases. And does the iterable version of listStatus resolve links
by default now too? Clearly FileSystem has more compatibility concerns than
FileSystem but I don't see an option where we preserve compatibility. We're
balancing compatibility against friendly semantics (would a typical caller
expect that they need to pass a flag to listStatus to prevent it from resolving
links?) and while I agree we should help the transition by providing an API
it's not clear to me it should be the default, and if we do provide a helper
that's not the default would it be easier for frameworks like Pig to just
update the relevant code to check the FileStatus? They'll need to do this
anyway if they have assumptions like HADOOP-6585 and it seems like they might
want to do something different for links to directories than links to files in
which case one helper might not work for everyone.
I agree with Andrew that we don't want to set the symlink bit for a non-symlink
(resolved) FileStatus as that would definitely break/confuse some things.
> globStatus of a symlink to a directory does not report symlink as a directory
> -----------------------------------------------------------------------------
>
> Key: HADOOP-9912
> URL: https://issues.apache.org/jira/browse/HADOOP-9912
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
> Priority: Blocker
> Attachments: HADOOP-9912-testcase.patch
>
>
> globStatus for a path that is a symlink to a directory used to report the
> resulting FileStatus as a directory but recently this has changed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira