[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522497#comment-14522497
 ] 

Colin Patrick McCabe commented on HADOOP-9984:
----------------------------------------------

I've asked some people more familiar with the upper layers of the stack to 
comment on the security issues of symlinks.  When we last asked about it, they 
were quite real and very scary.  It's worth noting that even Linux software, 
which has had to deal with symlinks for decades, still often has security 
vulnerabilities caused by symlinks.

The globStatus issue is essentially the same issue is this one.  Should 
globStatus resolve symlinks or not?  In the case of globStatus, things are even 
worse if you choose to resolve symlinks, since then you can glob for '*foo' and 
get back 'bar'.  A lot of software breaks if globs return back file names that 
the glob doesn't match.  A lot of users get highly confused, as well, when 
using FsShell.  I do not think globStatus should resolve symlinks, but the same 
group of "I don't want to ever think about a FileStatus type other than file or 
dir" people argued in favor of it.

With regard to adding new APIs: let's be honest.  HDFS users take years to 
update to using new APIs, if they ever do.  Moving to new APIs is a huge pain 
because it means that they have to drop compatibility with older versions of 
Hadoop.  For example, Apache Spark is still supporting Hadoop 1.x, so they 
won't use any API newer than that.  Admittedly, this is kind of an extreme 
example, but even projects with more reasonable compat policies like HBase will 
want to wait a year or two before dropping support for a Hadoop release.  And 
even when the compatibility window opens up to use the new API, they have to 
understand why they should use it and make an active effort to do so.  I think 
the number of people who would use listStatus2, if it existed, is extremely 
small.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
> default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
> HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that 
> many existing HDFS clients would be broken by listStatus and globStatus 
> returning symlinks.  One example is applications that assume that 
> !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
> resolved paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to