Another idea, in addition to an explicit format command, is to configure the
name node with the cluster's data nodes, rather than allowing any node to
connect ad hoc. A name node would then ignore an unexpected data node. It
would also be able to report when a data node is missing and could make
operational decisions  based on the number and identity of nodes that are up
vs. down.

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 23, 2006 12:27 PM
To: [email protected]
Subject: Re: DFSck - fsck for hadoop

[EMAIL PROTECTED] wrote:
> My error was that I intended to run nutch0 as job.tracker, but not as 
> a datanode.  So, when I ran bin/start-all.sh to start the cluster, it 
> seemed to replicate the non-existent filesystem on nutch0; thereby 
> starting to delete all my precious data.

It would be nice if this were harder to do.  A simple solution I proposed
would be to make it so that a new filesystem is not created automatically
when a namenode is started in an empty directory.  Rather a 'format' command
could be required.  A more complex solution might be to have a filesystem
id.  For example, some bits from each block id issued could be the
filesystem id.  When datanodes report blocks from a different filesystem,
the namenode would ignore them rather than delete them.

Doug


Reply via email to