[jira] Commented: (HADOOP-4752) Major performance drop on slower machines

Craig Macdonald (JIRA) Tue, 06 Jan 2009 12:13:08 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661292#action_12661292
 ]


Craig Macdonald commented on HADOOP-4752:
-----------------------------------------

I did some timings in HADOOP-4 - my test network machines were quite dated. 
Comparing fuse_dfs with bin/hadoop fs -cat (and NFS),  showed that fuse_dfs was 
distinctly slower - almost half the speed.

I think that we should test libhdfs, as this would show if the penalty is the 
JNI, or fuse_dfs.

I've recently documented some thoughts in HADOOP-4932 concerning the 
doConnectAsUser in fuse_dfs, but I realise now that this partly echoes 
Marc-Olivier's thoughts.



> Major performance drop on slower machines
> -----------------------------------------
>
>                 Key: HADOOP-4752
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4752
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fuse-dfs
>    Affects Versions: 0.18.2
>            Reporter: Marc-Olivier Fleury
>
> When running fuse_dfs on machines that have different CPU characteristics, I 
> noticed that the performance of fuse_dfs is very sensitive to the machine 
> power. 
> The command I used was simply a cat over a rather large amount of data stored 
> on HDFS. Here are the comparative times for the different types of machines:
> Intel(R) Pentium(R) 4 CPU 2.40GHz :                                2 min 40 s 
> Intel(R) Pentium(R) 4 CPU 3.06GHz:                                 1 min 50 s 
> 2 x Intel(R) Pentium(R) 4 CPU 3.00GHz:                           0 min 40 s 
> 2 x Intel(R) Xeon(TM) MP CPU 3.33GHz:                           0 min 28 s 
> Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz      0 min 15 s
> I tried to find other explanations for the drop in performance, such as 
> network configuration, or data locality, but the faster machines are the ones 
> that are "further away" from the others considering the network 
> configuration, and that don't run datanodes.
> top shows that the CPU usage of fuse_dfs is between 80-90% on the slower 
> machines, and about 40% on the fastest one.
> This leads me to the conclusion that fuse_dfs consumes a lot of CPU 
> resources, much more than expected.
> Any help or insight concerning this issue will be greatly appreciated, since 
> these difference actually result in days of computations for a given job.
> Thank you

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4752) Major performance drop on slower machines

Reply via email to