On 29/08/11 16:32, Shi Yu wrote:
I installed hadoop-0.20.2 in Eucalyptus VM environment. The file system is based on glusterfs, so it is a shared NAS. Though the nodes are much powerful (8 cores + 15G memory), I found the response of hadoop namenode and data nodes became very slow. For example, after running start-all.sh, the datanodes take more than 5 minutes to be ready. The safe mode time is really really long. Moreover, the program also runs much slower than it did on old physical cluster nodes. I have tried running hadoop on a cluster containing 15 VM nodes, also on a pesudo cluster on a single VM, all very slow. Is it because NAS is an IO bottleneck? The HDFS is created on top of glusterfs like reinventing the wheel, so I tried to adjust the replication setting to different values (1 to 4) but no improvement. I haven't tried CDH3 package yet.
Why use hdfs at all? If it's a shared fs, use file:// URLs > I wonder
whether switching to CDH3 would bring any significant improvement.
It won't
