RE: Performance / cluster scaling question

Jeff Eastman Fri, 21 Mar 2008 14:41:21 -0700

That makes the math come out a lot closer (3*423763=1271289). I've also
noticed there is some delay in reclaiming unused blocks so what you are
seeing in terms of block allocations do not surprise me.


> -----Original Message-----
> From: André Martin [mailto:[EMAIL PROTECTED]
> Sent: Friday, March 21, 2008 2:36 PM
> To: [email protected]
> Subject: Re: Performance / cluster scaling question
> 
> 3 - the default one...
> 
> Jeff Eastman wrote:
> > What's your replication factor?
> > Jeff
> >
> >
> >> -----Original Message-----
> >> From: André Martin [mailto:[EMAIL PROTECTED]
> >> Sent: Friday, March 21, 2008 2:25 PM
> >> To: [email protected]
> >> Subject: Performance / cluster scaling question
> >>
> >> Hi everyone,
> >> I ran a distributed system that consists of 50 spiders/crawlers and 8
> >> server nodes with a Hadoop DFS cluster with 8 datanodes and a
> namenode...
> >> Each spider has 5 job processing / data crawling threads and puts
> >> crawled data as one complete file onto the DFS - additionally there are
> >> splits created for each server node that are put as files onto the DFS
> >> as well. So basically there are 50*5*9 = ~2250 concurrent writes across
> >> 8 datanodes.
> >> The splits are read by the server nodes and will be deleted afterwards,
> >> so those (split)-files exists for only a few seconds to minutes...
> >> Since 99% of the files have a size of less than 64 MB (the default
> block
> >> size) I expected that the number of files is roughly equal to the
> number
> >> of blocks. After running the system for 24hours the namenode WebUI
> shows
> >> 423763 files and directories and 1480735 blocks. It looks like that the
> >> system does not catch up with deleting all the invalidated blocks - my
> >> assumption?!?
> >> Also, I noticed that the overall performance of the cluster goes down
> >> (see attached image).
> >> There are a bunch of Could not get block locations. Aborting...
> >> exceptions and those exceptions seem to appear more frequently towards
> >> the end of the experiment.
> >>
> >>> java.io.IOException: Could not get block locations. Aborting...
> >>>     at
> >>>
> >>>
> >>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCl
> >> ient.java:1824)
> >>
> >>>     at
> >>>
> >>>
> >>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java
> >> :1479)
> >>
> >>>     at
> >>>
> >>>
> >>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient
> >> .java:1571)
> >> So, is the cluster simply saturated with the such a frequent creation
> >> and deletion of files, or is the network that actual bottleneck? The
> >> work load does not change at all during the whole experiment.
> >> On cluster side I see lots of the following exceptions:
> >>
>= >>> 2008-03-21 20:28:05,411 INFO org.apache.hadoop.dfs.DataNode:
> >>> PacketResponder 1 for block blk_6757062148746339382 terminating
> >>> 2008-03-21 20:28:05,411 INFO org.apache.hadoop.dfs.DataNode:
> >>> writeBlock blk_6757062148746339382 received exception
> >>>
> >> java.io.EOFException
> >>
> >>> 2008-03-21 20:28:05,411 ERROR org.apache.hadoop.dfs.DataNode:
> >>> 141.xxx..xxx.xxx:50010:DataXceiver: java.io.EOFException
> >>>     at java.io.DataInputStream.readInt(Unknown Source)
> >>>     at
> >>>
> >>>
> >>
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:22
> >> 63)
> >>
> >>>     at
> >>>
> >>>
> >>
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
> >>
> >>>     at
> org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
> >>>     at java.lang.Thread.run(Unknown Source)
> >>> 2008-03-21 19:26:46,535 INFO org.apache.hadoop.dfs.DataNode:
> >>> writeBlock blk_-7369396710977076579 received exception
> >>> java.net.SocketException: Connection reset
> >>> 2008-03-21 19:26:46,535 ERROR org.apache.hadoop.dfs.DataNode:
> >>> 141.xxx.xxx.xxx:50010:DataXceiver: java.net.SocketException:
> >>> Connection reset
> >>>     at java.net.SocketInputStream.read(Unknown Source)
> >>>     at java.io.BufferedInputStream.fill(Unknown Source)
> >>>     at java.io.BufferedInputStream.read(Unknown Source)
> >>>     at java.io.DataInputStream.readInt(Unknown Source)
> >>>     at
> >>>
> >>>
> >>
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:22
> >> 63)
> >>
> >>>     at
> >>>
> >>>
> >>
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1150)
> >>
> >>>     at
> org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
> >>>     at java.lang.Thread.run(Unknown Source)
> >>>
> >> I'm running Hadoop 0.16.1 - Has anyone made the same or a similar
> >> experience.
> >> How can the performance degradation be avoided? More datanodes? Why
> >> seems the block deletion not to catch up with the deletion of the file?
> >> Thanks in advance for your insights, ideas & suggestions :-)
> >>
> >> Cu on the 'net,
> >>                         Bye - bye,
> >>
> >>                                    <<<<< André <<<< >>>> èrbnA >>>>>
> >>

RE: Performance / cluster scaling question

Reply via email to