[ 
https://issues.apache.org/jira/browse/HDFS-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782661#action_12782661
 ] 

Yoram Arnon commented on HDFS-786:
----------------------------------

Adding some numbers:

running the command locally, on a particular webmap cluster's namenode, takes 3 
seconds:
time hadoop/bin/hadoop fs -dus /.../atoms

real    0m2.916s
user    0m1.215s
sys     0m0.171s

running the same command, still locally, using hftp, it takes 18 minutes:
time hadoop/bin/hadoop fs -dus hftp://.../atoms
 
real    18m11.154s
user    10m37.726s
sys     0m16.516s

running the command remotely, from a client in a different datacenter, again 
using hftp, took 3 hours and change (sorry, no 'time' info)


> Implement getContentSummary(..) in HftpFileSystem
> -------------------------------------------------
>
>                 Key: HDFS-786
>                 URL: https://issues.apache.org/jira/browse/HDFS-786
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>
> HftpFileSystem does not override getContentSummary(..).  As a result, it uses 
> FileSystem's default implementation, which computes content summary on the 
> client side by calling listStatus(..) recursively.  In contrast, 
> DistributedFileSystem has overridden getContentSummary(..) and does the 
> computation on the NameNode.
> As a result, running "fs -dus" on hftp is much slower than running it on hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to