+1 again 8-)

On Mar 23, 2006, at 2:26 PM, Yoram Arnon wrote:

Another idea, in addition to an explicit format command, is to configure the name node with the cluster's data nodes, rather than allowing any node to connect ad hoc. A name node would then ignore an unexpected data node. It would also be able to report when a data node is missing and could make operational decisions based on the number and identity of nodes that are up
vs. down.

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 12:27 PM
To: [email protected]
Subject: Re: DFSck - fsck for hadoop

[EMAIL PROTECTED] wrote:
My error was that I intended to run nutch0 as job.tracker, but not as
a datanode.  So, when I ran bin/start-all.sh to start the cluster, it
seemed to replicate the non-existent filesystem on nutch0; thereby
starting to delete all my precious data.

It would be nice if this were harder to do. A simple solution I proposed would be to make it so that a new filesystem is not created automatically when a namenode is started in an empty directory. Rather a 'format' command could be required. A more complex solution might be to have a filesystem
id.  For example, some bits from each block id issued could be the
filesystem id. When datanodes report blocks from a different filesystem,
the namenode would ignore them rather than delete them.

Doug



Reply via email to