[ 
https://issues.apache.org/jira/browse/HDFS-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated HDFS-8234:
-------------------------------------
    Description: 
HDFS-985 added partial listing in listStatus to avoid listing entries of large 
directory in one go. If listStatus(Path p, PathFilter f) call is made, filter 
is applied after fetching all the entries resulting in a big list being 
constructed on the client side. If the 
DistributedFileSystem.listStatusInternal() applied the PathFilter it would be 
more efficient. So DistributedFileSystem should override listStatus(Path f, 
PathFilter filter) and apply PathFilter early. 

Globber.java also applies filter after calling listStatus.  It should call 
listStatus with the PathFilter.

{code}
FileStatus[] children = listStatus(candidate.getPath());
           .........
            for (FileStatus child : children) {
              // Set the child path based on the parent path.
              child.setPath(new Path(candidate.getPath(),
                      child.getPath().getName()));
              if (globFilter.accept(child.getPath())) {
                newCandidates.add(child);
              }
            }
{code}


  was:HDFS-985 added partial listing in listStatus to avoid listing entries of 
large directory in one go. If listStatus(Path p, PathFilter f) call is made, 
filter is applied after fetching all the entries resulting in a big list being 
constructed on the client side. If the 
DistributedFileSystem.listStatusInternal() applied the PathFilter it would be 
more efficient. 

        Summary: DistributedFileSystem and Globber should apply PathFilter 
early  (was: DistributedFileSystem should override listStatus(Path f, 
PathFilter filter) )

> DistributedFileSystem and Globber should apply PathFilter early
> ---------------------------------------------------------------
>
>                 Key: HDFS-8234
>                 URL: https://issues.apache.org/jira/browse/HDFS-8234
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>              Labels: newbie
>
> HDFS-985 added partial listing in listStatus to avoid listing entries of 
> large directory in one go. If listStatus(Path p, PathFilter f) call is made, 
> filter is applied after fetching all the entries resulting in a big list 
> being constructed on the client side. If the 
> DistributedFileSystem.listStatusInternal() applied the PathFilter it would be 
> more efficient. So DistributedFileSystem should override listStatus(Path f, 
> PathFilter filter) and apply PathFilter early. 
> Globber.java also applies filter after calling listStatus.  It should call 
> listStatus with the PathFilter.
> {code}
> FileStatus[] children = listStatus(candidate.getPath());
>            .........
>             for (FileStatus child : children) {
>               // Set the child path based on the parent path.
>               child.setPath(new Path(candidate.getPath(),
>                       child.getPath().getName()));
>               if (globFilter.accept(child.getPath())) {
>                 newCandidates.add(child);
>               }
>             }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to