[ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790972#comment-13790972 ]
Sanjay Radia commented on HADOOP-9984: -------------------------------------- bq. Daryn, the discussion about resolved paths versus unresolved ones belongs on HADOOP-9780, not here. At least some of the points in Daryn's comments on Oct 4th apply to HADOOP-9984 , rather than HADOOP-9780. Hadoop-9984's latest patch resolves the symlinks for listStatus, i.e. if the target directory denoted by the path has children that are symlinks those symlinks will be resolved (so as to allow old apps that did "if (! stat.isDir() then AssumeItIsAFile " to work unchanged) Lets consider the following example: Lets say the directory /foo/bar has children a, b, c, d and lets say c is a symlink to /x/a. The method listStatus(/foo/bar) will, with the patch, return an array of FileStatus for a, b *a*, d. The repeated a is because /foo/bar/c is resolved and its target /x/a is returned. This is a spec violation: The result of listStatus is suppose to return a set of unique directory entries (since a dir cannot have duplicate names) Further if someone was using listStatus to copy the contents of /foo/bar the copy operation will fail with a FileAlreadyExistsException. Daryn gives an example of where someone is trying to do rename and gets tripped by the duplicate entry. One could argue that for some of the other issues that Daryn raises, the application writer should have been using another API. I picked the duplicates one because it breaks a fundamental invariant of a directory - ie all its children have unique names. I am not offering any solution in this comment (although I have 2 suggestions). I want us to first agree that the current patch which resolves symlinks for listStatus has a serious issue. > FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by > default > ---------------------------------------------------------------------------------- > > Key: HADOOP-9984 > URL: https://issues.apache.org/jira/browse/HADOOP-9984 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs > Affects Versions: 2.1.0-beta > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Blocker > Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, > HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, > HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, > HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch > > > During the process of adding symlink support to FileSystem, we realized that > many existing HDFS clients would be broken by listStatus and globStatus > returning symlinks. One example is applications that assume that > !FileStatus#isFile implies that the inode is a directory. As we discussed in > HADOOP-9972 and HADOOP-9912, we should default these APIs to returning > resolved paths. -- This message was sent by Atlassian JIRA (v6.1#6144)