That makes the math come out a lot closer (3*423763=1271289). I've also noticed there is some delay in reclaiming unused blocks so what you are seeing in terms of block allocations do not surprise me.
> -----Original Message----- > From: André Martin [mailto:[EMAIL PROTECTED] > Sent: Friday, March 21, 2008 2:36 PM > To: [email protected] > Subject: Re: Performance / cluster scaling question > > 3 - the default one... > > Jeff Eastman wrote: > > What's your replication factor? > > Jeff > > > > > >> -----Original Message----- > >> From: André Martin [mailto:[EMAIL PROTECTED] > >> Sent: Friday, March 21, 2008 2:25 PM > >> To: [email protected] > >> Subject: Performance / cluster scaling question > >> > >> Hi everyone, > >> I ran a distributed system that consists of 50 spiders/crawlers and 8 > >> server nodes with a Hadoop DFS cluster with 8 datanodes and a > namenode... > >> Each spider has 5 job processing / data crawling threads and puts > >> crawled data as one complete file onto the DFS - additionally there are > >> splits created for each server node that are put as files onto the DFS > >> as well. So basically there are 50*5*9 = ~2250 concurrent writes across > >> 8 datanodes. > >> The splits are read by the server nodes and will be deleted afterwards, > >> so those (split)-files exists for only a few seconds to minutes... > >> Since 99% of the files have a size of less than 64 MB (the default > block > >> size) I expected that the number of files is roughly equal to the > number > >> of blocks. After running the system for 24hours the namenode WebUI > shows > >> 423763 files and directories and 1480735 blocks. It looks like that the > >> system does not catch up with deleting all the invalidated blocks - my > >> assumption?!? > >> Also, I noticed that the overall performance of the cluster goes down > >> (see attached image). > >> There are a bunch of Could not get block locations. Aborting... > >> exceptions and those exceptions seem to appear more frequently towards > >> the end of the experiment. > >> > >>> java.io.IOException: Could not get block locations. Aborting... > >>> at > >>> > >>> > >> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCl > >> ient.java:1824) > >> > >>> at > >>> > >>> > >> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java > >> :1479) > >> > >>> at > >>> > >>> > >> > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient > >> .java:1571) > >> So, is the cluster simply saturated with the such a frequent creation > >> and deletion of files, or is the network that actual bottleneck? The > >> work load does not change at all during the whole experiment. > >> On cluster side I see lots of the following exceptions: > >> >= >>> 2008-03-21 20:28:05,411 INFO org.apache.hadoop.dfs.DataNode: > >>> PacketResponder 1 for block blk_6757062148746339382 terminating > >>> 2008-03-21 20:28:05,411 INFO org.apache.hadoop.dfs.DataNode: > >>> writeBlock blk_6757062148746339382 received exception > >>> > >> java.io.EOFException > >> > >>> 2008-03-21 20:28:05,411 ERROR org.apache.hadoop.dfs.DataNode: > >>> 141.xxx..xxx.xxx:50010:DataXceiver: java.io.EOFException > >>> at java.io.DataInputStream.readInt(Unknown Source) > >>> at > >>> > >>> > >> > org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:22 > >> 63) > >> > >>> at > >>> > >>> > >> > org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150) > >> > >>> at > org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) > >>> at java.lang.Thread.run(Unknown Source) > >>> 2008-03-21 19:26:46,535 INFO org.apache.hadoop.dfs.DataNode: > >>> writeBlock blk_-7369396710977076579 received exception > >>> java.net.SocketException: Connection reset > >>> 2008-03-21 19:26:46,535 ERROR org.apache.hadoop.dfs.DataNode: > >>> 141.xxx.xxx.xxx:50010:DataXceiver: java.net.SocketException: > >>> Connection reset > >>> at java.net.SocketInputStream.read(Unknown Source) > >>> at java.io.BufferedInputStream.fill(Unknown Source) > >>> at java.io.BufferedInputStream.read(Unknown Source) > >>> at java.io.DataInputStream.readInt(Unknown Source) > >>> at > >>> > >>> > >> > org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:22 > >> 63) > >> > >>> at > >>> > >>> > >> > org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150) > >> > >>> at > org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) > >>> at java.lang.Thread.run(Unknown Source) > >>> > >> I'm running Hadoop 0.16.1 - Has anyone made the same or a similar > >> experience. > >> How can the performance degradation be avoided? More datanodes? Why > >> seems the block deletion not to catch up with the deletion of the file? > >> Thanks in advance for your insights, ideas & suggestions :-) > >> > >> Cu on the 'net, > >> Bye - bye, > >> > >> <<<<< André <<<< >>>> èrbnA >>>>> > >>
