I think you may need to use ulimit in addition to setting 
dfs.datanode.max.xcievers. For example, on one of our boxes:

~ $ ulimit -a
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
open files                    (-n) 300000
pipe size          (512 bytes, -p) 10
stack size            (kbytes, -s) 10240
cpu time             (seconds, -t) unlimited
max user processes            (-u) 16357
virtual memory        (kbytes, -v) unlimited

I used 'ulimit -n 300000' to set the maximum number of open files for the box.


On Jan 26, 2012, at 11:10 AM, Mark question wrote:

Hi again,
I've tried :
    <property>
       <name>dfs.datanode.max.xcievers</name>
       <value>1048576</value>
     </property>
but I'm still getting the same error ... how high can I go??

Thanks,
Mark



On Thu, Jan 26, 2012 at 9:29 AM, Mark question 
<[email protected]<mailto:[email protected]>> wrote:

Thanks for the reply.... I have nothing about dfs.datanode.max.xceivers on
my hdfs-site.xml so hopefully this would solve the problem and about the
ulimit -n , I'm running on an NFS cluster, so usually I just start Hadoop
with a single bin/start-all.sh ... Do you think I can add it by
bin/Datanode -ulimit n ?

Mark


On Thu, Jan 26, 2012 at 7:33 AM, Mapred Learn 
<[email protected]<mailto:[email protected]>>wrote:

U need to set ulimit -n <bigger value> on datanode and restart datanodes.

Sent from my iPhone

On Jan 26, 2012, at 6:06 AM, Idris Ali 
<[email protected]<mailto:[email protected]>> wrote:

Hi Mark,

On a lighter note what is the count of xceivers?
dfs.datanode.max.xceivers
property in hdfs-site.xml?

Thanks,
-idris

On Thu, Jan 26, 2012 at 5:28 PM, Michel Segel <
[email protected]<mailto:[email protected]>>wrote:

Sorry going from memory...
As user Hadoop or mapred or hdfs what do you see when you do a ulimit
-a?
That should give you the number of open files allowed by a single
user...


Sent from a remote device. Please excuse any typos...

Mike Segel

On Jan 26, 2012, at 5:13 AM, Mark question 
<[email protected]<mailto:[email protected]>>
wrote:

Hi guys,

I get this error from a job trying to process 3Million records.

java.io.IOException: Bad connect ack with firstBadLink
192.168.1.20:50010
 at


org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903)
 at


org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
 at


org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
 at


org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

When I checked the logfile of the datanode-20, I see :

2012-01-26 03:00:11,827 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.1.20:50010,
storageID=DS-97608578-192.168.1.20-50010-1327575205369,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
 at sun.nio.ch.IOUtil.read(IOUtil.java:175)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
 at


org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
 at


org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
 at

org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
 at

org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at java.io.DataInputStream.read(DataInputStream.java:132)
 at


org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
 at


org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
 at


org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
 at


org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
 at


org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
 at


org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
 at java.lang.Thread.run(Thread.java:662)


Which is because I'm running 10 maps per taskTracker on a 20 node
cluster,
each map opens about 300 files so that should give 6000 opened files
at
the
same time ... why is this a problem? the maximum # of files per
process
on
one machine is:

cat /proc/sys/fs/file-max   ---> 2403545


Any suggestions?

Thanks,
Mark





Reply via email to