Another idea, in addition to an explicit format command, is to configure the name node with the cluster's data nodes, rather than allowing any node to connect ad hoc. A name node would then ignore an unexpected data node. It would also be able to report when a data node is missing and could make operational decisions based on the number and identity of nodes that are up vs. down.
-----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, March 23, 2006 12:27 PM To: [email protected] Subject: Re: DFSck - fsck for hadoop [EMAIL PROTECTED] wrote: > My error was that I intended to run nutch0 as job.tracker, but not as > a datanode. So, when I ran bin/start-all.sh to start the cluster, it > seemed to replicate the non-existent filesystem on nutch0; thereby > starting to delete all my precious data. It would be nice if this were harder to do. A simple solution I proposed would be to make it so that a new filesystem is not created automatically when a namenode is started in an empty directory. Rather a 'format' command could be required. A more complex solution might be to have a filesystem id. For example, some bits from each block id issued could be the filesystem id. When datanodes report blocks from a different filesystem, the namenode would ignore them rather than delete them. Doug
