[
https://issues.apache.org/jira/browse/HDFS-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782661#action_12782661
]
Yoram Arnon commented on HDFS-786:
----------------------------------
Adding some numbers:
running the command locally, on a particular webmap cluster's namenode, takes 3
seconds:
time hadoop/bin/hadoop fs -dus /.../atoms
real 0m2.916s
user 0m1.215s
sys 0m0.171s
running the same command, still locally, using hftp, it takes 18 minutes:
time hadoop/bin/hadoop fs -dus hftp://.../atoms
real 18m11.154s
user 10m37.726s
sys 0m16.516s
running the command remotely, from a client in a different datacenter, again
using hftp, took 3 hours and change (sorry, no 'time' info)
> Implement getContentSummary(..) in HftpFileSystem
> -------------------------------------------------
>
> Key: HDFS-786
> URL: https://issues.apache.org/jira/browse/HDFS-786
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Tsz Wo (Nicholas), SZE
> Assignee: Tsz Wo (Nicholas), SZE
>
> HftpFileSystem does not override getContentSummary(..). As a result, it uses
> FileSystem's default implementation, which computes content summary on the
> client side by calling listStatus(..) recursively. In contrast,
> DistributedFileSystem has overridden getContentSummary(..) and does the
> computation on the NameNode.
> As a result, running "fs -dus" on hftp is much slower than running it on hdfs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.