On 29/08/11 16:32, Shi Yu wrote:
I installed hadoop-0.20.2 in Eucalyptus VM environment. The file system
is based on glusterfs, so it is a shared NAS. Though the nodes are much
powerful (8 cores + 15G memory), I found the response of hadoop namenode
and data nodes became very slow. For example, after running
start-all.sh, the datanodes take more than 5 minutes to be ready. The
safe mode time is really really long. Moreover, the program also runs
much slower than it did on old physical cluster nodes. I have tried
running hadoop on a cluster containing 15 VM nodes, also on a pesudo
cluster on a single VM, all very slow. Is it because NAS is an IO
bottleneck? The HDFS is created on top of glusterfs like reinventing the
wheel, so I tried to adjust the replication setting to different values
(1 to 4) but no improvement. I haven't tried CDH3 package yet.

Why use hdfs at all? If it's a shared fs, use file:// URLs



> I wonder
whether switching to CDH3 would bring any significant improvement.

It won't

Reply via email to