Hello,
I have been using Hadoop on a cluster with AMD Opteron Processor 2212
clocked at 2GMz and also a cluster with Atom N330 clocked at 1.6GHz.
Both are dual cores. I always use HDFS for storing input data and output
data and I observe high CPU consumption caused by HDFS in both clusters.
In the AMD cluster, the bottleneck is the disk. I use TestDFSIO to test
the performance. The writing throughput to HDFS is about 50MB/s when the
replication factor is 1 and each node runs one mapper, but the CPU
consumption is about 50% for DataNode and about 40% for the mapper of
TestDFSIO. When I test the Atom cluster, the bottleneck is CPU. I used
the same setting and I got the similar writing throughput, but the CPU
consumption is close to 100% for DataNode and the mapper. Could anyone
tell me what is the CPU usage in your cluster?
Thanks,
Da