[
https://issues.apache.org/jira/browse/HDFS-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511306#comment-14511306
]
Rohini Palaniswamy commented on HDFS-8234:
------------------------------------------
I am not working on it. Please go ahead.
> DistributedFileSystem and Globber should apply PathFilter early
> ---------------------------------------------------------------
>
> Key: HDFS-8234
> URL: https://issues.apache.org/jira/browse/HDFS-8234
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: J.Andreina
> Labels: newbie
>
> HDFS-985 added partial listing in listStatus to avoid listing entries of
> large directory in one go. If listStatus(Path p, PathFilter f) call is made,
> filter is applied after fetching all the entries resulting in a big list
> being constructed on the client side. If the
> DistributedFileSystem.listStatusInternal() applied the PathFilter it would be
> more efficient. So DistributedFileSystem should override listStatus(Path f,
> PathFilter filter) and apply PathFilter early.
> Globber.java also applies filter after calling listStatus. It should call
> listStatus with the PathFilter.
> {code}
> FileStatus[] children = listStatus(candidate.getPath());
> .........
> for (FileStatus child : children) {
> // Set the child path based on the parent path.
> child.setPath(new Path(candidate.getPath(),
> child.getPath().getName()));
> if (globFilter.accept(child.getPath())) {
> newCandidates.add(child);
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)