+1 again 8-)
On Mar 23, 2006, at 2:26 PM, Yoram Arnon wrote:
Another idea, in addition to an explicit format command, is to
configure the
name node with the cluster's data nodes, rather than allowing any
node to
connect ad hoc. A name node would then ignore an unexpected data
node. It
would also be able to report when a data node is missing and could
make
operational decisions based on the number and identity of nodes
that are up
vs. down.
-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 23, 2006 12:27 PM
To: [email protected]
Subject: Re: DFSck - fsck for hadoop
[EMAIL PROTECTED] wrote:
My error was that I intended to run nutch0 as job.tracker, but not as
a datanode. So, when I ran bin/start-all.sh to start the cluster, it
seemed to replicate the non-existent filesystem on nutch0; thereby
starting to delete all my precious data.
It would be nice if this were harder to do. A simple solution I
proposed
would be to make it so that a new filesystem is not created
automatically
when a namenode is started in an empty directory. Rather a
'format' command
could be required. A more complex solution might be to have a
filesystem
id. For example, some bits from each block id issued could be the
filesystem id. When datanodes report blocks from a different
filesystem,
the namenode would ignore them rather than delete them.
Doug