On Thu, Mar 3, 2011 at 6:44 PM, Mark Kerzner <[email protected]> wrote:

> Hi,
>
> in my small development cluster I have a master/slave node and a slave
> node,
> and I shut down the slave node at night. I often see that my HDFS is
> corrupted, and I have to reformat the name node and to delete the data
> directory.
>

Why do you shut down the slave at night? HDFS should only be corrupted if
you're missing all copies of a block. With a replication factor of 3
(default) you should have 100% of the data on both nodes (if you only have 2
nodes). If you've dialed it down to 1, simply starting the slave back up
should "un-corrupt" HDFS. You definitely don't want to be doing this to HDFS
regularly (dropping nodes from the cluster and re-adding them unless you're
trying to test HDFS' failure semantics.

It finally dawns on me that with such small cluster I better shut the
> daemons down, for otherwise they are trying too hard to compensate for the
> missing node and eventually it goes bad. Is my understanding correct?
>

It doesn't "eventually go bad." If the NN sees a DN disappear it may start
re-replicating data to another node. In such a small cluster, maybe there's
no where else to get the blocks from, but I bet you dialed the replication
factor down to 1 (or have code that writes files with a rep factor of 1 like
teragen / terasort).

In short, if you're going to shut down nodes like this put the NN into safe
mode so it doesn't freak out (which will also make the cluster unusable
during that time) but there's definitely no need to be reformatting HDFS.
Just re-introduce the DN you shut down to the cluster.


>
> Thank you,
> Mark
>

-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Reply via email to