FYI, Just ran a 50 node cluster using one of the new kernels for
Fedora with all nodes forced onto the same 'availability zone' and
there were no timeouts or failed writes.
On Mar 27, 2008, at 4:16 PM, Chris K Wensel wrote:
If it's any consolation, I'm seeing similar behaviors on 0.16.0 when
running on EC2 when I push the number of nodes in the cluster past 40.
On Mar 24, 2008, at 6:31 AM, André Martin wrote:
Thanks for the clarification, dhruba :-)
Anyway, what can cause those other exceptions such as "Could not
get block locations" and "DataXceiver: java.io.EOFException"? Can
anyone give me a little more insight about those exceptions?
And does anyone have a similar workload (frequent writes and
deletion of small files), and what could cause the performance
degradation (see first post)? I think HDFS should be able to
handle two million and more files/blocks...
Also, I observed that some of my datanodes do not "heartbeat" to
the namenode for several seconds (up to 400 :-() from time to time
- when I check those specific datanodes and do a "top", I see the
"du" command running that seems to got stuck?!?
Thanks and Happy Easter :-)
Cu on the 'net,
Bye - bye,
<<<<< André <<<< >>>> èrbnA >>>>>
dhruba Borthakur wrote:
The namenode lazily instructs a Datanode to delete blocks. As a
response to every heartbeat from a Datanode, the Namenode
instructs it to delete a maximum on 100 blocks. Typically, the
heartbeat periodicity is 3 seconds. The heartbeat thread in the
Datanode deletes the block files synchronously before it can send
the next heartbeat. That's the reason a small number (like 100)
was chosen.
If you have 8 datanodes, your system will probably delete about
800 blocks every 3 seconds.
Thanks,
dhruba
-----Original Message-----
From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday,
March 21, 2008 3:06 PM
To: core-user@hadoop.apache.org
Subject: Re: Performance / cluster scaling question
After waiting a few hours (without having any load), the block
number and "DFS Used" space seems to go down...
My question is: is the hardware simply too weak/slow to send the
block deletion request to the datanodes in a timely manner, or do
simply those "crappy" HDDs cause the delay, since I noticed that I
can take up to 40 minutes when deleting ~400.000 files at once
manually using "rm -r"...
Actually - my main concern is why the performance à la the
throughput goes down - any ideas?
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/