[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782505#comment-13782505
 ] 

Andrew Wang commented on HADOOP-9984:
-------------------------------------

Hey Colin, thanks for the patch. I can tell a lot of thought went into this :) 
The additional javadoc helped a lot. FYI, I reviewed v9 of this patch.

Nitty/general comments:
* DCRException#serialVersionUID should be private.
{code}
          throw new DirectoryContentsResolutionException("error resolving " +
...
            throw new DirectoryContentsResolutionException("error " +
...
{code}
* Please capitalize "Error", in a few places. Can grep for it.
{code}
   * a directory.  Symlinks contained in the directory will be resolved.
{code}
* FileContext doesn't use two spaces after a period in javadoc, please audit.
* Can we rename {{listStatusImpl}} to {{listStatusInternal}} per more normal 
naming convention? e.g. DFS#listLinkStatusInternal. Same comment for 
FileContext.
* In a couple javadocs, you now mention DCRException, but didn't add it to the 
{{throws}}. Same for FileContext.

Javadoc:
* Rather than copy+pasting javadoc for similar methods, I like using {{See 
@link}} with a small note about the differences (if any). Will help shrink this 
patch.
* New public {{FileContext#listLocatedStatus}} method needs javadoc.
* In the {{createSymlink}} javadoc, {{listLinkStatus}} should also be in the 
list of functions that fully resolve. Here, it refers to resolving the input 
path, which both {{listStatus}} and {{listLinkStatus}} do. The current note 
about resolving the directory contents is good.

FileSystem:
* Renaming {{abstract listStatus}} to {{abstract listLinkStatus}} is sticky, 
but I think it's the right call. We want {{listStatus}} and {{globStatus}} to 
resolve by default, which means we need subclasses to implement a non-resolving 
{{listLinkStatus}}. This of course breaks out-of-tree {{FileSystems}}; I think 
this is unavoidable, but maybe other reviewers have ideas.
* Could pull out more {{*Internal}} methods, though the amount of code 
duplication isn't that high.

Globber:
* Are we planning to add {{globLinkStatus}} type methods to 
FileSystem/FileContext? Right now we have this {{resolveLinks}} which is always 
true (and in {{TestGlobStatus}} too). It's a little confusing right now; I'd 
like to either see the new APIs included here, or all of it broken out into a 
separate JIRA.
* Rather than {{uriToSchemeAndAuthority}}, can we instead use 
{{Path#makeQualified}} or {{FileSystem#makeQualified}}? If not, I also 
preferred the old style, since using parameters as return values kinda bites.

Misc:
* In WebHdfs and HttpFS, the {{Op}} / {{Operation}} is still called 
{{LISTSTATUS}}.
* Please use {{GenericTestUtils#assertExceptionContains}} in the new symlink 
test, you can check for the right path in the exception message.
* This is a complex, edge-case-happy change, so I'd like to see a lot of tests. 
Input paths with and without symlinks, intermediate symlinks, links to links, 
links to {{0000}} files, results with and without symlinks (with different 
dangling errors), wildcards and not in different parts of the path, and 
PathFilters that match against the input path prefix and expected directory 
contents. Go wild :)

* There are still inconsistencies about what {{Path#getPath}} will be when 
calling the various {{Status}} methods.
** {{FileSystem#listStatus}} first resolves the input path, then passes the 
resulting listing through {{getFileStatus}} which resolves, then passes it 
through the {{PathFilter}}. This means resolution happens in two places, both 
of which mess up the expectation of the PathFilter (which wants unresolved 
paths).
** Looks like the above also applies to {{FileContext}}.
** {{Globber#glob}}'s filter looks like it does the right thing, since it 
resets the path to the unresolved path after {{listLinkStatus}} and 
{{getFileLinkStatus}}. It unfortunately means we have to resolve each time 
(perf hit), since we aren't storing the resolved path in tandem.
** For compatibility, I think we want {{getPath}} to always be returning 
unresolved paths, I'd say even for {{getFileLinkStatus}}. We should add a 
{{Path#getResolvedPath}} method for symlink-aware users who want to avoid 
unnecessary resolutions (like {{Globber}}). We can do this in a follow-on 
though.
* Tests for this are important too.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
> default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
> HADOOP-9984.010.patch, HADOOP-9984.011.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that 
> many existing HDFS clients would be broken by listStatus and globStatus 
> returning symlinks.  One example is applications that assume that 
> !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
> resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to