Major performance drop on slower machines
-----------------------------------------

                 Key: HADOOP-4752
                 URL: https://issues.apache.org/jira/browse/HADOOP-4752
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/fuse-dfs
    Affects Versions: 0.18.2
            Reporter: Marc-Olivier Fleury


When running fuse_dfs on machines that have different CPU characteristics, I 
noticed that the performance of fuse_dfs is very sensitive to the machine 
power. 

The command I used was simply a cat over a rather large amount of data stored 
on HDFS. Here are the comparative times for the different types of machines:

Intel(R) Pentium(R) 4 CPU 2.40GHz :                                2 min 40 s 
Intel(R) Pentium(R) 4 CPU 3.06GHz:                                 1 min 50 s 
2 x Intel(R) Pentium(R) 4 CPU 3.00GHz:                           0 min 40 s 
2 x Intel(R) Xeon(TM) MP CPU 3.33GHz:                           0 min 28 s 
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz      0 min 15 s

I tried to find other explanations for the drop in performance, such as network 
configuration, or data locality, but the faster machines are the ones that are 
"further away" from the others considering the network configuration, and that 
don't run datanodes.

top shows that the CPU usage of fuse_dfs is between 80-90% on the slower 
machines, and about 40% on the fastest one.

This leads me to the conclusion that fuse_dfs consumes a lot of CPU resources, 
much more than expected.

Any help or insight concerning this issue will be greatly appreciated, since 
these difference actually result in days of computations for a given job.

Thank you

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to