Right, I totally forgot about the replication factor... However sometimes I even noticed ratios of 5:1 for block numbers to files...
Is the delay for block deletion/reclaiming an intended behavior?

Jeff Eastman wrote:
That makes the math come out a lot closer (3*423763=1271289). I've also
noticed there is some delay in reclaiming unused blocks so what you are
seeing in terms of block allocations do not surprise me.

-----Original Message-----
From: André Martin [mailto:[EMAIL PROTECTED]
Sent: Friday, March 21, 2008 2:36 PM
To: [email protected]
Subject: Re: Performance / cluster scaling question

3 - the default one...

Jeff Eastman wrote:
What's your replication factor?
Jeff


-----Original Message-----
From: André Martin [mailto:[EMAIL PROTECTED]
Sent: Friday, March 21, 2008 2:25 PM
To: [email protected]
Subject: Performance / cluster scaling question

Hi everyone,
I ran a distributed system that consists of 50 spiders/crawlers and 8
server nodes with a Hadoop DFS cluster with 8 datanodes and a
namenode...
Each spider has 5 job processing / data crawling threads and puts
crawled data as one complete file onto the DFS - additionally there are
splits created for each server node that are put as files onto the DFS
as well. So basically there are 50*5*9 = ~2250 concurrent writes across
8 datanodes.
The splits are read by the server nodes and will be deleted afterwards,
so those (split)-files exists for only a few seconds to minutes...
Since 99% of the files have a size of less than 64 MB (the default
block
size) I expected that the number of files is roughly equal to the
number
of blocks. After running the system for 24hours the namenode WebUI
shows
423763 files and directories and 1480735 blocks. It looks like that the
system does not catch up with deleting all the invalidated blocks - my
assumption?!?
Also, I noticed that the overall performance of the cluster goes down
(see attached image).
There are a bunch of Could not get block locations. Aborting...
exceptions and those exceptions seem to appear more frequently towards
the end of the experiment.

java.io.IOException: Could not get block locations. Aborting...
    at


org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCl
ient.java:1824)

    at


org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java
:1479)

    at


org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient
.java:1571)
So, is the cluster simply saturated with the such a frequent creation
and deletion of files, or is the network that actual bottleneck? The
work load does not change at all during the whole experiment.
On cluster side I see lots of the following exceptions:

= >>> 2008-03-21 20:28:05,411 INFO org.apache.hadoop.dfs.DataNode:
PacketResponder 1 for block blk_6757062148746339382 terminating
2008-03-21 20:28:05,411 INFO org.apache.hadoop.dfs.DataNode:
writeBlock blk_6757062148746339382 received exception

java.io.EOFException

2008-03-21 20:28:05,411 ERROR org.apache.hadoop.dfs.DataNode:
141.xxx..xxx.xxx:50010:DataXceiver: java.io.EOFException
    at java.io.DataInputStream.readInt(Unknown Source)
    at


org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:22
63)

    at


org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
    at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
    at java.lang.Thread.run(Unknown Source)
2008-03-21 19:26:46,535 INFO org.apache.hadoop.dfs.DataNode:
writeBlock blk_-7369396710977076579 received exception
java.net.SocketException: Connection reset
2008-03-21 19:26:46,535 ERROR org.apache.hadoop.dfs.DataNode:
141.xxx.xxx.xxx:50010:DataXceiver: java.net.SocketException:
Connection reset
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at java.io.DataInputStream.readInt(Unknown Source)
    at


org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:22
63)

    at


org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
    at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
    at java.lang.Thread.run(Unknown Source)

I'm running Hadoop 0.16.1 - Has anyone made the same or a similar
experience.
How can the performance degradation be avoided? More datanodes? Why
seems the block deletion not to catch up with the deletion of the file?
Thanks in advance for your insights, ideas & suggestions :-)

Cu on the 'net,
                        Bye - bye,

                                   <<<<< André <<<< >>>> èrbnA >>>>>





Reply via email to