[ 
https://issues.apache.org/jira/browse/HADOOP-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095847#comment-14095847
 ] 

Colin Patrick McCabe commented on HADOOP-10942:
-----------------------------------------------

bq. For the immediate file status, the prior code used to loop over the path 
components even if there are globs. In this patch, it does an immediate file 
status on the full path. This reduces the overhead for FsShell commands.

You always need to loop when there are globs.  You need to see which children 
match the glob and which don't.  I think what you meant to write is "the prior 
code used to loop over the path components even if there are *not* globs".

Looping is not a problem, though.  Calling {{listStatus}} or {{fileStatus}} is 
what generates RPCs.  And the existing globber code doesn't do that unless it 
needs to.

A simple way of seeing this is to add a LOG.info statement to 
{{Globber#listStatus}} and {{Globber#getFileStatus}}, and then try {{hadoop fs 
\-ls}} on a path without globs.  The only output you will see is a single call 
to {{getFileStatus}}, because that's the only call that's needed.  The internal 
looping that it does inside the function is not important because most loop 
iterations don't generate an RPC.

> Globbing optimizations and regression fix
> -----------------------------------------
>
>                 Key: HADOOP-10942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10942
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HADOOP-10942.patch
>
>
> When globbing was commonized to support both filesystem and filecontext, it 
> regressed a fix that prevents an intermediate glob that matches a file from 
> throwing a confusing permissions exception.  The hdfs traverse check requires 
> the exec bit which a file does not have.
> Additional optimizations to reduce rpcs actually increases them if 
> directories contain 1 item.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to