[ 
https://issues.apache.org/jira/browse/HADOOP-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728284#comment-13728284
 ] 

Andrew Wang commented on HADOOP-9817:
-------------------------------------

Hi Colin,

I like this single Globber class a lot more than the previous mess we were 
dealing with. The rewrite (with comments!) is much easier to follow.

Here are some comments for Globber:

- How about making this an abstract class, with subclasses for FileSystem and 
FileContext? Cleaner than having if statements.
- Whitespace is weird for these lines:
{code}
    ArrayList<FileStatus> results = 
        new ArrayList<FileStatus>(flattenedPatterns .size());
...
    for (String flatPattern : flattenedPatterns ) {
...
      Path absPattern =
          fixRelativePart(new Path(flatPattern .isEmpty() ? "." : flatPattern 
));
{code}

- The comment around the {{sawWildcard}} if statement at the end could be 
beefed up. This looks like some kind of big compatibility wart for 
{{FileSystem}}, so it'd be helpful to see the expected cases laid out (and any 
more rationale, like the {{FSShell}}).
- Minor optimization, but we can break out of the {{components}} for loop early 
if {{candidates}} is empty.
- I think we can simplify this path expression:
{code}
            child.setPath(new Path(scheme, authority,
                new Path(candidate.getPath(),
                    new Path(child.getPath().getName())).toUri().getPath()));
            // equivalent to this?
            child.setPath(new Path(candidate.getPath(),
                    new Path(child.getPath().getName())));
{code}

- Shouldn't this be comparing against just the relevant component 
{{child.getPath().getName()}}? Also, could we even just push this filtering 
down to the NN by doing {{listStatus(Path, GlobFilter}}?
{code}
            if (globFilter.accept(child.getPath())) {
              newCandidates.add(child);
            }
{code}
- I don't understand this bit. If the flattened pattern is empty, we list the 
current working directory? This looks like a carry-over from FileSystem, where 
it looks like "." and other non-glob paths trigger some special logic via 
{{components = null}}. Probably warrants at least a comment.
- Also related, if we pass in some weird glob like \{a,\}/\{b,\}/\{c,\} that 
expands to an empty path multiple times, does this trigger listing the working 
directory multiple times?
{code}
      Path absPattern =
          fixRelativePart(new Path(flatPattern .isEmpty() ? "." : flatPattern 
));
{code}
- Noticed that Globber does manual slash removal. Could we use 
{{URI#normalize}} for this? Normalization is supposed to happen as part of URI 
resolution, so the Path's underlying URI might already do this for you.
                
> FileSystem#globStatus and FileContext#globStatus need to work with symlinks
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-9817
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9817
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HADOOP-9817.004.patch, HADOOP-9817.005.patch
>
>
> FileSystem#globStatus and FileContext#globStatus need to work with symlinks.  
> Currently, they resolve all links, so that if you have:
> {code}
> /alpha/beta
> /alphaLink -> alpha
> {code}
> and you take {{globStatus(/alphaLink/*)}}, you will get {{/alpha/beta}}, 
> rather than the expected {{/alphaLink/beta}}.
> We even resolve terminal symlinks, which would prevent listing a symlink in 
> FSShell, for example.  Instead, we should build up the path incrementally.  
> This will allow the shell to behave as expected, and also allow custom 
> globbers to "see" the correct paths for symlinks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to