Thanks for the reply.... I have nothing about dfs.datanode.max.xceivers on
my hdfs-site.xml so hopefully this would solve the problem and about the
ulimit -n , I'm running on an NFS cluster, so usually I just start Hadoop
with a single bin/start-all.sh ... Do you think I can add it by
bin/Datanode -ulimit n ?

Mark

On Thu, Jan 26, 2012 at 7:33 AM, Mapred Learn <[email protected]>wrote:

> U need to set ulimit -n <bigger value> on datanode and restart datanodes.
>
> Sent from my iPhone
>
> On Jan 26, 2012, at 6:06 AM, Idris Ali <[email protected]> wrote:
>
> > Hi Mark,
> >
> > On a lighter note what is the count of xceivers?
> dfs.datanode.max.xceivers
> > property in hdfs-site.xml?
> >
> > Thanks,
> > -idris
> >
> > On Thu, Jan 26, 2012 at 5:28 PM, Michel Segel <[email protected]
> >wrote:
> >
> >> Sorry going from memory...
> >> As user Hadoop or mapred or hdfs what do you see when you do a ulimit
> -a?
> >> That should give you the number of open files allowed by a single
> user...
> >>
> >>
> >> Sent from a remote device. Please excuse any typos...
> >>
> >> Mike Segel
> >>
> >> On Jan 26, 2012, at 5:13 AM, Mark question <[email protected]> wrote:
> >>
> >>> Hi guys,
> >>>
> >>>  I get this error from a job trying to process 3Million records.
> >>>
> >>> java.io.IOException: Bad connect ack with firstBadLink
> >> 192.168.1.20:50010
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
> >>>
> >>> When I checked the logfile of the datanode-20, I see :
> >>>
> >>> 2012-01-26 03:00:11,827 ERROR
> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> >>> 192.168.1.20:50010,
> >> storageID=DS-97608578-192.168.1.20-50010-1327575205369,
> >>> infoPort=50075, ipcPort=50020):DataXceiver
> >>> java.io.IOException: Connection reset by peer
> >>>   at sun.nio.ch.FileDispatcher.read0(Native Method)
> >>>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >>>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> >>>   at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> >>>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> >>>   at
> >>>
> >>
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
> >>>   at
> >>>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> >>>   at
> >>>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >>>   at
> >>>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >>>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> >>>   at java.io.DataInputStream.read(DataInputStream.java:132)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
> >>>   at java.lang.Thread.run(Thread.java:662)
> >>>
> >>>
> >>> Which is because I'm running 10 maps per taskTracker on a 20 node
> >> cluster,
> >>> each map opens about 300 files so that should give 6000 opened files at
> >> the
> >>> same time ... why is this a problem? the maximum # of files per process
> >> on
> >>> one machine is:
> >>>
> >>> cat /proc/sys/fs/file-max   ---> 2403545
> >>>
> >>>
> >>> Any suggestions?
> >>>
> >>> Thanks,
> >>> Mark
> >>
>

Reply via email to