[ 
https://issues.apache.org/jira/browse/HDFS-13623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863426#comment-16863426
 ] 

David Mollitor commented on HDFS-13623:
---------------------------------------

Or...

{code:java}
public Collection<ContentSummary> getContentSummary(Path f, PathFilter 
filter)...
{code}

... assuming the filter is applied on the server side, not the client.

> getContentSummary to return ContentSummary without hidden files
> ---------------------------------------------------------------
>
>                 Key: HDFS-13623
>                 URL: https://issues.apache.org/jira/browse/HDFS-13623
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, namenode
>    Affects Versions: 3.1.0
>            Reporter: Miklos Szurap
>            Priority: Major
>
> Improve the 
> [FileSystem.getContentSummary()|http://hadoop.apache.org/docs/r3.1.0/api/org/apache/hadoop/fs/FileSystem.html#getContentSummary-org.apache.hadoop.fs.Path-]
>  method to return ContentSummary object with 
> "getFileCountWithoutHiddenFiles()" and "getLengthWithoutHiddenFiles()".
> That two new counter should not include hidden files and hidden directories 
> (and it's sub-contents).
> {code:java}
> public static final PathFilter HIDDEN_FILES_PATH_FILTER = new PathFilter() {
>   public boolean accept(Path p) {
>    String name = p.getName();
>    return !name.startsWith("_") && !name.startsWith(".");
>   }
> };{code}
> This would be especially useful for Hive: to compute table statistics with a 
> single {{contentSummary}} call instead of {{globStatus}} (multiple 
> {{listStatus}} calls) and iterating over multiple thousand of objects on 
> client side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to