I installed hadoop-0.20.2 in Eucalyptus VM environment. The file system is based on glusterfs, so it is a shared NAS. Though the nodes are much powerful (8 cores + 15G memory), I found the response of hadoop namenode and data nodes became very slow. For example, after running start-all.sh, the datanodes take more than 5 minutes to be ready. The safe mode time is really really long. Moreover, the program also runs much slower than it did on old physical cluster nodes. I have tried running hadoop on a cluster containing 15 VM nodes, also on a pesudo cluster on a single VM, all very slow. Is it because NAS is an IO bottleneck? The HDFS is created on top of glusterfs like reinventing the wheel, so I tried to adjust the replication setting to different values (1 to 4) but no improvement. I haven't tried CDH3 package yet. I wonder whether switching to CDH3 would bring any significant improvement. Any suggestion about this issue is highly appreciated.

Shi

Reply via email to