[
https://issues.apache.org/jira/browse/HADOOP-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652582#action_12652582
]
Pete Wyckoff commented on HADOOP-4752:
--------------------------------------
Are you using -ordbuffer=XXXX where XXXX is in the MBs? Somehow the default
rd buffer on fuse-dfs is 32K, so every 32K, it has to talk to the DFSClient.
the thing is it definitely requires a lot of context switches which will cause
some penalty - I don't have any numbers for that, but if every one of those has
to talk to the dfsclient, it will be expensive.
Craig M did some benchmarks on reads - don't know if he looked at the cpu
usage...
> Major performance drop on slower machines
> -----------------------------------------
>
> Key: HADOOP-4752
> URL: https://issues.apache.org/jira/browse/HADOOP-4752
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/fuse-dfs
> Affects Versions: 0.18.2
> Reporter: Marc-Olivier Fleury
>
> When running fuse_dfs on machines that have different CPU characteristics, I
> noticed that the performance of fuse_dfs is very sensitive to the machine
> power.
> The command I used was simply a cat over a rather large amount of data stored
> on HDFS. Here are the comparative times for the different types of machines:
> Intel(R) Pentium(R) 4 CPU 2.40GHz : 2 min 40 s
> Intel(R) Pentium(R) 4 CPU 3.06GHz: 1 min 50 s
> 2 x Intel(R) Pentium(R) 4 CPU 3.00GHz: 0 min 40 s
> 2 x Intel(R) Xeon(TM) MP CPU 3.33GHz: 0 min 28 s
> Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz 0 min 15 s
> I tried to find other explanations for the drop in performance, such as
> network configuration, or data locality, but the faster machines are the ones
> that are "further away" from the others considering the network
> configuration, and that don't run datanodes.
> top shows that the CPU usage of fuse_dfs is between 80-90% on the slower
> machines, and about 40% on the fastest one.
> This leads me to the conclusion that fuse_dfs consumes a lot of CPU
> resources, much more than expected.
> Any help or insight concerning this issue will be greatly appreciated, since
> these difference actually result in days of computations for a given job.
> Thank you
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.