Re: Performance / cluster scaling question

Chris K Wensel Thu, 27 Mar 2008 22:07:03 -0700

FYI, Just ran a 50 node cluster using one of the new kernels forFedora with all nodes forced onto the same 'availability zone' andthere were no timeouts or failed writes.


On Mar 27, 2008, at 4:16 PM, Chris K Wensel wrote:

If it's any consolation, I'm seeing similar behaviors on 0.16.0 whenrunning on EC2 when I push the number of nodes in the cluster past 40.
On Mar 24, 2008, at 6:31 AM, André Martin wrote:
Thanks for the clarification, dhruba :-)
Anyway, what can cause those other exceptions such as "Could notget block locations" and "DataXceiver: java.io.EOFException"? Cananyone give me a little more insight about those exceptions?And does anyone have a similar workload (frequent writes anddeletion of small files), and what could cause the performancedegradation (see first post)? I think HDFS should be able tohandle two million and more files/blocks...Also, I observed that some of my datanodes do not "heartbeat" tothe namenode for several seconds (up to 400 :-() from time to time- when I check those specific datanodes and do a "top", I see the"du" command running that seems to got stuck?!?
Thanks and Happy Easter :-)

Cu on the 'net,
                     Bye - bye,

                                <<<<< André <<<< >>>> èrbnA >>>>>

dhruba Borthakur wrote:
The namenode lazily instructs a Datanode to delete blocks. As aresponse to every heartbeat from a Datanode, the Namenodeinstructs it to delete a maximum on 100 blocks. Typically, theheartbeat periodicity is 3 seconds. The heartbeat thread in theDatanode deletes the block files synchronously before it can sendthe next heartbeat. That's the reason a small number (like 100)was chosen.
If you have 8 datanodes, your system will probably delete about800 blocks every 3 seconds.
Thanks,
dhruba

-----Original Message-----
From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday,March 21, 2008 3:06 PM
To: core-user@hadoop.apache.org
Subject: Re: Performance / cluster scaling question
After waiting a few hours (without having any load), the blocknumber and "DFS Used" space seems to go down...My question is: is the hardware simply too weak/slow to send theblock deletion request to the datanodes in a timely manner, or dosimply those "crappy" HDDs cause the delay, since I noticed that Ican take up to 40 minutes when deleting ~400.000 files at oncemanually using "rm -r"...Actually - my main concern is why the performance à la thethroughput goes down - any ideas?
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: Performance / cluster scaling question

Reply via email to