Thanks for the reply.... I have nothing about dfs.datanode.max.xceivers on my hdfs-site.xml so hopefully this would solve the problem and about the ulimit -n , I'm running on an NFS cluster, so usually I just start Hadoop with a single bin/start-all.sh ... Do you think I can add it by bin/Datanode -ulimit n ?
Mark On Thu, Jan 26, 2012 at 7:33 AM, Mapred Learn <[email protected]>wrote: > U need to set ulimit -n <bigger value> on datanode and restart datanodes. > > Sent from my iPhone > > On Jan 26, 2012, at 6:06 AM, Idris Ali <[email protected]> wrote: > > > Hi Mark, > > > > On a lighter note what is the count of xceivers? > dfs.datanode.max.xceivers > > property in hdfs-site.xml? > > > > Thanks, > > -idris > > > > On Thu, Jan 26, 2012 at 5:28 PM, Michel Segel <[email protected] > >wrote: > > > >> Sorry going from memory... > >> As user Hadoop or mapred or hdfs what do you see when you do a ulimit > -a? > >> That should give you the number of open files allowed by a single > user... > >> > >> > >> Sent from a remote device. Please excuse any typos... > >> > >> Mike Segel > >> > >> On Jan 26, 2012, at 5:13 AM, Mark question <[email protected]> wrote: > >> > >>> Hi guys, > >>> > >>> I get this error from a job trying to process 3Million records. > >>> > >>> java.io.IOException: Bad connect ack with firstBadLink > >> 192.168.1.20:50010 > >>> at > >>> > >> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903) > >>> at > >>> > >> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) > >>> at > >>> > >> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) > >>> at > >>> > >> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) > >>> > >>> When I checked the logfile of the datanode-20, I see : > >>> > >>> 2012-01-26 03:00:11,827 ERROR > >>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > >>> 192.168.1.20:50010, > >> storageID=DS-97608578-192.168.1.20-50010-1327575205369, > >>> infoPort=50075, ipcPort=50020):DataXceiver > >>> java.io.IOException: Connection reset by peer > >>> at sun.nio.ch.FileDispatcher.read0(Native Method) > >>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > >>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) > >>> at sun.nio.ch.IOUtil.read(IOUtil.java:175) > >>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) > >>> at > >>> > >> > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) > >>> at > >>> > >> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > >>> at > >>> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > >>> at > >>> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > >>> at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > >>> at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > >>> at java.io.DataInputStream.read(DataInputStream.java:132) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) > >>> at > >>> > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) > >>> at java.lang.Thread.run(Thread.java:662) > >>> > >>> > >>> Which is because I'm running 10 maps per taskTracker on a 20 node > >> cluster, > >>> each map opens about 300 files so that should give 6000 opened files at > >> the > >>> same time ... why is this a problem? the maximum # of files per process > >> on > >>> one machine is: > >>> > >>> cat /proc/sys/fs/file-max ---> 2403545 > >>> > >>> > >>> Any suggestions? > >>> > >>> Thanks, > >>> Mark > >> >
