On 9/13/07 6:00 AM, "C G" <[EMAIL PROTECTED]> wrote:

>   I'd like to run nodes with around 2T of local disk set up as JBOD.  So I
> would have 4 separate file systems per machine, for example /hdfs_a, /hdfs_b,
> /hdfs_c, /hdfs_d .  Is it possible to configure things so that HDFS knows
> about all 4 file systems?

    This is exactly how we configure all of our nodes.  We have 4 drives
that have a chunk out of them for / and swap.  The rest is dedicated to
hadoop, HDFS, etc.

> Since we're using HDFS replication I see no point in
> using RAID-anything...to me that's the whole point of replication  Comments?

    We haven't done any tests around what sort of performance difference a
stripe+concat configuration would make.  In theory, it might be somewhat
faster since you have a lot more spindles.  However, you also increase your
risk since a single drive failure would take out the entire data node rather
than just ~25% of it.  As you point out, the replication factor makes RAID
fairly useless from a data recovery perspective.

>   If you can't run multiple namenodes, then that sort of implies the machine
> which is hosting *the* namenode needs to do all the traditional things to
> protect against data loss/corruption, including frequent backups, RAID
> mirroring, etc.  

    We configure our primary namenode such that it has two copies of the
fsimage and edits file: one on the local machine and one on a NFS server.
If for some reason the local machine blows up, we can rebuild with the NFS
version.  

Reply via email to