Mark You have this "Connection reset by peer". Why do you think this problem is related to too many open files?
Raj >________________________________ > From: Mark question <[email protected]> >To: [email protected] >Sent: Thursday, January 26, 2012 11:10 AM >Subject: Re: Too many open files Error > >Hi again, >I've tried : > <property> > <name>dfs.datanode.max.xcievers</name> > <value>1048576</value> > </property> >but I'm still getting the same error ... how high can I go?? > >Thanks, >Mark > > > >On Thu, Jan 26, 2012 at 9:29 AM, Mark question <[email protected]> wrote: > >> Thanks for the reply.... I have nothing about dfs.datanode.max.xceivers on >> my hdfs-site.xml so hopefully this would solve the problem and about the >> ulimit -n , I'm running on an NFS cluster, so usually I just start Hadoop >> with a single bin/start-all.sh ... Do you think I can add it by >> bin/Datanode -ulimit n ? >> >> Mark >> >> >> On Thu, Jan 26, 2012 at 7:33 AM, Mapred Learn <[email protected]>wrote: >> >>> U need to set ulimit -n <bigger value> on datanode and restart datanodes. >>> >>> Sent from my iPhone >>> >>> On Jan 26, 2012, at 6:06 AM, Idris Ali <[email protected]> wrote: >>> >>> > Hi Mark, >>> > >>> > On a lighter note what is the count of xceivers? >>> dfs.datanode.max.xceivers >>> > property in hdfs-site.xml? >>> > >>> > Thanks, >>> > -idris >>> > >>> > On Thu, Jan 26, 2012 at 5:28 PM, Michel Segel < >>> [email protected]>wrote: >>> > >>> >> Sorry going from memory... >>> >> As user Hadoop or mapred or hdfs what do you see when you do a ulimit >>> -a? >>> >> That should give you the number of open files allowed by a single >>> user... >>> >> >>> >> >>> >> Sent from a remote device. Please excuse any typos... >>> >> >>> >> Mike Segel >>> >> >>> >> On Jan 26, 2012, at 5:13 AM, Mark question <[email protected]> >>> wrote: >>> >> >>> >>> Hi guys, >>> >>> >>> >>> I get this error from a job trying to process 3Million records. >>> >>> >>> >>> java.io.IOException: Bad connect ack with firstBadLink >>> >> 192.168.1.20:50010 >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) >>> >>> >>> >>> When I checked the logfile of the datanode-20, I see : >>> >>> >>> >>> 2012-01-26 03:00:11,827 ERROR >>> >>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( >>> >>> 192.168.1.20:50010, >>> >> storageID=DS-97608578-192.168.1.20-50010-1327575205369, >>> >>> infoPort=50075, ipcPort=50020):DataXceiver >>> >>> java.io.IOException: Connection reset by peer >>> >>> at sun.nio.ch.FileDispatcher.read0(Native Method) >>> >>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) >>> >>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) >>> >>> at sun.nio.ch.IOUtil.read(IOUtil.java:175) >>> >>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) >>> >>> at >>> >>> >>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) >>> >>> at >>> >>> >>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) >>> >>> at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) >>> >>> at java.io.BufferedInputStream.read(BufferedInputStream.java:317) >>> >>> at java.io.DataInputStream.read(DataInputStream.java:132) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) >>> >>> at >>> >>> >>> >> >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) >>> >>> at java.lang.Thread.run(Thread.java:662) >>> >>> >>> >>> >>> >>> Which is because I'm running 10 maps per taskTracker on a 20 node >>> >> cluster, >>> >>> each map opens about 300 files so that should give 6000 opened files >>> at >>> >> the >>> >>> same time ... why is this a problem? the maximum # of files per >>> process >>> >> on >>> >>> one machine is: >>> >>> >>> >>> cat /proc/sys/fs/file-max ---> 2403545 >>> >>> >>> >>> >>> >>> Any suggestions? >>> >>> >>> >>> Thanks, >>> >>> Mark >>> >> >>> >> >> > > >
