[
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522497#comment-14522497
]
Colin Patrick McCabe commented on HADOOP-9984:
----------------------------------------------
I've asked some people more familiar with the upper layers of the stack to
comment on the security issues of symlinks. When we last asked about it, they
were quite real and very scary. It's worth noting that even Linux software,
which has had to deal with symlinks for decades, still often has security
vulnerabilities caused by symlinks.
The globStatus issue is essentially the same issue is this one. Should
globStatus resolve symlinks or not? In the case of globStatus, things are even
worse if you choose to resolve symlinks, since then you can glob for '*foo' and
get back 'bar'. A lot of software breaks if globs return back file names that
the glob doesn't match. A lot of users get highly confused, as well, when
using FsShell. I do not think globStatus should resolve symlinks, but the same
group of "I don't want to ever think about a FileStatus type other than file or
dir" people argued in favor of it.
With regard to adding new APIs: let's be honest. HDFS users take years to
update to using new APIs, if they ever do. Moving to new APIs is a huge pain
because it means that they have to drop compatibility with older versions of
Hadoop. For example, Apache Spark is still supporting Hadoop 1.x, so they
won't use any API newer than that. Admittedly, this is kind of an extreme
example, but even projects with more reasonable compat policies like HBase will
want to wait a year or two before dropping support for a Hadoop release. And
even when the compatibility window opens up to use the new API, they have to
understand why they should use it and make an active effort to do so. I think
the number of people who would use listStatus2, if it existed, is extremely
small.
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by
> default
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-9984
> URL: https://issues.apache.org/jira/browse/HADOOP-9984
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs
> Affects Versions: 2.1.0-beta
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Critical
> Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch,
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch,
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch,
> HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that
> many existing HDFS clients would be broken by listStatus and globStatus
> returning symlinks. One example is applications that assume that
> !FileStatus#isFile implies that the inode is a directory. As we discussed in
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning
> resolved paths.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)