Hi Chris & Hadoopers,
we changed our system architecture in that way so that most of the data
is now streamed directly from the spiders/crawlers nodes instead of
using/creating temporary files on the DFS - now it performs way better
and the exceptions are gone :-) ...seems to be a good decision when
having only a relatively small cluster (like ours w/ 8 data nodes) where
the deletion of blocks seems not to catch up with the creation of new
temp files (through the max 100 blocks/3 seconds deletion "restriction").
Cu on the 'net,
Bye - bye,
<<<<< André <<<< >>>> èrbnA >>>>>
Chris K Wensel wrote:
If it's any consolation, I'm seeing similar behaviors on 0.16.0 when
running on EC2 when I push the number of nodes in the cluster past 40.
On Mar 24, 2008, at 6:31 AM, André Martin wrote:
Thanks for the clarification, dhruba :-)
Anyway, what can cause those other exceptions such as "Could not get
block locations" and "DataXceiver: java.io.EOFException"? Can anyone
give me a little more insight about those exceptions?
And does anyone have a similar workload (frequent writes and deletion
of small files), and what could cause the performance degradation
(see first post)? I think HDFS should be able to handle two million
and more files/blocks...
Also, I observed that some of my datanodes do not "heartbeat" to the
namenode for several seconds (up to 400 :-() from time to time - when
I check those specific datanodes and do a "top", I see the "du"
command running that seems to got stuck?!?
Thanks and Happy Easter :-)
Cu on the 'net,
Bye - bye,
<<<<< André <<<< >>>> èrbnA >>>>>
dhruba Borthakur wrote:
The namenode lazily instructs a Datanode to delete blocks. As a
response to every heartbeat from a Datanode, the Namenode instructs
it to delete a maximum on 100 blocks. Typically, the heartbeat
periodicity is 3 seconds. The heartbeat thread in the Datanode
deletes the block files synchronously before it can send the next
heartbeat. That's the reason a small number (like 100) was chosen.
If you have 8 datanodes, your system will probably delete about 800
blocks every 3 seconds.
Thanks,
dhruba
-----Original Message-----
From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday, March
21, 2008 3:06 PM
To: [email protected]
Subject: Re: Performance / cluster scaling question
After waiting a few hours (without having any load), the block
number and "DFS Used" space seems to go down...
My question is: is the hardware simply too weak/slow to send the
block deletion request to the datanodes in a timely manner, or do
simply those "crappy" HDDs cause the delay, since I noticed that I
can take up to 40 minutes when deleting ~400.000 files at once
manually using "rm -r"...
Actually - my main concern is why the performance à la the
throughput goes down - any ideas?