Hi,

I am using 0.19.0 on EC2. The Hadoop execution and HDFS directories are on
EBS volumes mounted to each node in my EC2 cluster. Only the install of
hadoop is in the AMI. We have 10 EBS volumes and when the cluster starts it
randomly picks one for each slave. We don't always start all 10 slaves
depending on what type of work we are going to do.

Every third or fourth start of the cluster the namenode goes into safemode
and won't come out automatically. Restarting datanodes and task trackers on
each of the slaves doesn't help. Not much in the log files besides the error
about waiting for the available %. Forcing it out of safe mode allows the
cluster to start working.

My only thought is that something is being stored on one of the EBS volumes
not being mounted when starting a smaller configuration (say 6 nodes instead
of 10). But isn't HDFS fault tolerant so that if there is a missing node it
carries on?

Any advice on why the namenode and datanodes can't find all the data blocks?
Or where to look for more information about what might be going on?

Thanks,

Chris

Reply via email to