[
https://issues.apache.org/jira/browse/HADOOP-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728284#comment-13728284
]
Andrew Wang commented on HADOOP-9817:
-------------------------------------
Hi Colin,
I like this single Globber class a lot more than the previous mess we were
dealing with. The rewrite (with comments!) is much easier to follow.
Here are some comments for Globber:
- How about making this an abstract class, with subclasses for FileSystem and
FileContext? Cleaner than having if statements.
- Whitespace is weird for these lines:
{code}
ArrayList<FileStatus> results =
new ArrayList<FileStatus>(flattenedPatterns .size());
...
for (String flatPattern : flattenedPatterns ) {
...
Path absPattern =
fixRelativePart(new Path(flatPattern .isEmpty() ? "." : flatPattern
));
{code}
- The comment around the {{sawWildcard}} if statement at the end could be
beefed up. This looks like some kind of big compatibility wart for
{{FileSystem}}, so it'd be helpful to see the expected cases laid out (and any
more rationale, like the {{FSShell}}).
- Minor optimization, but we can break out of the {{components}} for loop early
if {{candidates}} is empty.
- I think we can simplify this path expression:
{code}
child.setPath(new Path(scheme, authority,
new Path(candidate.getPath(),
new Path(child.getPath().getName())).toUri().getPath()));
// equivalent to this?
child.setPath(new Path(candidate.getPath(),
new Path(child.getPath().getName())));
{code}
- Shouldn't this be comparing against just the relevant component
{{child.getPath().getName()}}? Also, could we even just push this filtering
down to the NN by doing {{listStatus(Path, GlobFilter}}?
{code}
if (globFilter.accept(child.getPath())) {
newCandidates.add(child);
}
{code}
- I don't understand this bit. If the flattened pattern is empty, we list the
current working directory? This looks like a carry-over from FileSystem, where
it looks like "." and other non-glob paths trigger some special logic via
{{components = null}}. Probably warrants at least a comment.
- Also related, if we pass in some weird glob like \{a,\}/\{b,\}/\{c,\} that
expands to an empty path multiple times, does this trigger listing the working
directory multiple times?
{code}
Path absPattern =
fixRelativePart(new Path(flatPattern .isEmpty() ? "." : flatPattern
));
{code}
- Noticed that Globber does manual slash removal. Could we use
{{URI#normalize}} for this? Normalization is supposed to happen as part of URI
resolution, so the Path's underlying URI might already do this for you.
> FileSystem#globStatus and FileContext#globStatus need to work with symlinks
> ---------------------------------------------------------------------------
>
> Key: HADOOP-9817
> URL: https://issues.apache.org/jira/browse/HADOOP-9817
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HADOOP-9817.004.patch, HADOOP-9817.005.patch
>
>
> FileSystem#globStatus and FileContext#globStatus need to work with symlinks.
> Currently, they resolve all links, so that if you have:
> {code}
> /alpha/beta
> /alphaLink -> alpha
> {code}
> and you take {{globStatus(/alphaLink/*)}}, you will get {{/alpha/beta}},
> rather than the expected {{/alphaLink/beta}}.
> We even resolve terminal symlinks, which would prevent listing a symlink in
> FSShell, for example. Instead, we should build up the path incrementally.
> This will allow the shell to behave as expected, and also allow custom
> globbers to "see" the correct paths for symlinks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira