[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790972#comment-13790972
 ] 

Sanjay Radia commented on HADOOP-9984:
--------------------------------------

bq. Daryn, the discussion about resolved paths versus unresolved ones belongs 
on HADOOP-9780, not here.
At least some of the points in Daryn's comments on Oct 4th apply to HADOOP-9984 
, rather than HADOOP-9780.

Hadoop-9984's latest patch resolves the symlinks for listStatus, i.e. if the 
target directory denoted by the  path has children that are symlinks those 
symlinks will be resolved (so as to allow old apps that did "if (! stat.isDir() 
then AssumeItIsAFile " to work unchanged)

Lets consider the following example:
Lets say the directory  /foo/bar has  children a, b, c, d and lets say c is a 
symlink to /x/a.
The method listStatus(/foo/bar) will, with the patch, return  an array of 
FileStatus for a, b *a*, d.  The repeated a is because /foo/bar/c is resolved 
and its target /x/a is returned. 

This is a spec violation: The result of listStatus is suppose to return a set 
of unique directory entries (since a dir cannot have duplicate names) Further 
if someone was using listStatus to copy the contents of /foo/bar the copy 
operation will fail with a FileAlreadyExistsException. Daryn gives an example 
of where someone is trying to do rename and gets tripped by the duplicate entry.

One could argue that for some of the other issues that Daryn raises, the 
application writer should have been using another API. I picked the duplicates 
one because it breaks a fundamental invariant of a directory - ie all its 
children have unique names. 

I am not offering any solution in this comment (although I have 2 suggestions). 
I want us to first agree that the current patch which resolves symlinks for 
listStatus has a serious issue.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
> default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
> HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that 
> many existing HDFS clients would be broken by listStatus and globStatus 
> returning symlinks.  One example is applications that assume that 
> !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
> resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to