[
https://issues.apache.org/jira/browse/HADOOP-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728367#comment-13728367
]
Colin Patrick McCabe commented on HADOOP-9817:
----------------------------------------------
bq. How about making this an abstract class, with subclasses for FileSystem and
FileContext? Cleaner than having if statements.
I'll give it a try. Hopefully it won't be that much longer.
bq. Minor optimization, but we can break out of the components for loop early
if candidates is empty.
Ok.
bq. Whitespace is weird for these lines:
OK.
bq. I think we can simplify this path expression:
yeah.
bq. Shouldn't this be comparing against just the relevant component
child.getPath().getName()
check {{GlobFilter#accept}}. The first thing it does is call {{getName()}}.
bq. Also related, if we pass in some weird glob like {a,}/{b,}/{c,} that
expands to an empty path multiple times, does this trigger listing the working
directory multiple times?
The pattern you gave would expand to
{code}
$HOME/a/b/c
$HOME/a/b
$HOME/a/c
$HOME/a
$HOME/b/c
$HOME/b
$HOME
{code}
The home directory itself is only returned once because there's only one path
through the subgroups that returns all empties.
Now if you had something like this
{code}
{,,}
{code}
Then you would be listing the home directory three times, because you requested
it three times.
Similarly
{code}
{a,a,a}
{code}
should map to
{code}
/a
/a
/a
{code}
Nothing unusual here?
bq. Noticed that Globber does manual slash removal. Could we use URI#normalize
for this? Normalization is supposed to happen as part of URI resolution, so the
Path's underlying URI might already do this for you.
It's really more just going from a string to path components. Normally the
only component that gets removed is the first one (for /). I agree that URI
normalization should handle this, but this is kind of a belt-and-suspenders
approach.
bq. The comment around the sawWildcard if statement at the end could be beefed
up. This looks like some kind of big compatibility wart for FileSystem, so it'd
be helpful to see the expected cases laid out (and any more rationale, like the
FSShell).
I'll see if I can come up with a better comment. And yes, as you guessed, it
is a big compatibility wart (in my opinion at least.)
bq. I don't understand this bit. If the flattened pattern is empty, we list the
current working directory?
Yeah. Will add a comment.
> FileSystem#globStatus and FileContext#globStatus need to work with symlinks
> ---------------------------------------------------------------------------
>
> Key: HADOOP-9817
> URL: https://issues.apache.org/jira/browse/HADOOP-9817
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HADOOP-9817.004.patch, HADOOP-9817.005.patch
>
>
> FileSystem#globStatus and FileContext#globStatus need to work with symlinks.
> Currently, they resolve all links, so that if you have:
> {code}
> /alpha/beta
> /alphaLink -> alpha
> {code}
> and you take {{globStatus(/alphaLink/*)}}, you will get {{/alpha/beta}},
> rather than the expected {{/alphaLink/beta}}.
> We even resolve terminal symlinks, which would prevent listing a symlink in
> FSShell, for example. Instead, we should build up the path incrementally.
> This will allow the shell to behave as expected, and also allow custom
> globbers to "see" the correct paths for symlinks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira