[ 
https://issues.apache.org/jira/browse/HADOOP-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728367#comment-13728367
 ] 

Colin Patrick McCabe commented on HADOOP-9817:
----------------------------------------------

bq. How about making this an abstract class, with subclasses for FileSystem and 
FileContext? Cleaner than having if statements.

I'll give it a try.  Hopefully it won't be that much longer.

bq. Minor optimization, but we can break out of the components for loop early 
if candidates is empty.

Ok.

bq. Whitespace is weird for these lines:

OK.

bq. I think we can simplify this path expression:

yeah.

bq. Shouldn't this be comparing against just the relevant component 
child.getPath().getName()

check {{GlobFilter#accept}}.  The first thing it does is call {{getName()}}.

bq. Also related, if we pass in some weird glob like {a,}/{b,}/{c,} that 
expands to an empty path multiple times, does this trigger listing the working 
directory multiple times?

The pattern you gave would expand to
{code}
$HOME/a/b/c
$HOME/a/b
$HOME/a/c
$HOME/a
$HOME/b/c
$HOME/b
$HOME
{code}

The home directory itself is only returned once because there's only one path 
through the subgroups that returns all empties.

Now if you had something like this
{code}
{,,}
{code}

Then you would be listing the home directory three times, because you requested 
it three times.

Similarly 
{code}
{a,a,a}
{code}

should map to 
{code}
/a
/a
/a
{code}

Nothing unusual here?

bq. Noticed that Globber does manual slash removal. Could we use URI#normalize 
for this? Normalization is supposed to happen as part of URI resolution, so the 
Path's underlying URI might already do this for you.

It's really more just going from a string to path components.  Normally the 
only component that gets removed is the first one (for /).  I agree that URI 
normalization should handle this, but this is kind of a belt-and-suspenders 
approach.

bq. The comment around the sawWildcard if statement at the end could be beefed 
up. This looks like some kind of big compatibility wart for FileSystem, so it'd 
be helpful to see the expected cases laid out (and any more rationale, like the 
FSShell).

I'll see if I can come up with a better comment.  And yes, as you guessed, it 
is a big compatibility wart (in my opinion at least.)

bq. I don't understand this bit. If the flattened pattern is empty, we list the 
current working directory?

Yeah.  Will add a comment.
                
> FileSystem#globStatus and FileContext#globStatus need to work with symlinks
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-9817
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9817
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HADOOP-9817.004.patch, HADOOP-9817.005.patch
>
>
> FileSystem#globStatus and FileContext#globStatus need to work with symlinks.  
> Currently, they resolve all links, so that if you have:
> {code}
> /alpha/beta
> /alphaLink -> alpha
> {code}
> and you take {{globStatus(/alphaLink/*)}}, you will get {{/alpha/beta}}, 
> rather than the expected {{/alphaLink/beta}}.
> We even resolve terminal symlinks, which would prevent listing a symlink in 
> FSShell, for example.  Instead, we should build up the path incrementally.  
> This will allow the shell to behave as expected, and also allow custom 
> globbers to "see" the correct paths for symlinks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to