On Fri, Sep 11, 2009 at 12:23 PM, Allen Wittenauer
<[email protected]> wrote:
> On 9/10/09 8:06 PM, "David B. Ritch" <[email protected]> wrote:
>> Thank you both.  That's what we did today.  It seems fairly reasonable
>> when a node has a few disks, say 3-5.  However, at some sites, with
>> larger nodes, it seems more awkward.
>
> Hmm.  The vast majority of sites are using 4 disk configurations, that I
> know of.  I'd love to know who using 5 or more drives and have a
> conversation with them.
>
> [The only people who did terasort on 12 disks that I know of is Google...
> and they weren't using Hadoop. :)]
>
>

>>>
Will the
running datanode process pick up the fact that an entire partition goes
missing and reappears empty a few minutes later?
>>>

If you lose a directory the datanode stops. See
https://issues.apache.org/jira/browse/HDFS-457

FYI we are using 8 1TB SATA disks on Data Nodes. When I lose a disk
Hadoop "self heals". You can control the network bandwidth used for
replication and you have the balancer app. With large disks you really
dont have time to copy data since the datanode is going to be marked
down in 10 minutes and all the data will begin getting copied
elsewhere.

Reply via email to