Hey there,
I've been running a cluster for about a year (about 20 machines). I've run
many concurrent jobs there and some of them with multiOutput and never had
any problem (multiOutputs where creating just 3 or 4 different outputs).
Now I've a job with multiOutputs that creates 100 different outputs and it
always end up with errors.
Tasks start throwing this erros:

java.io.IOException: Bad connect ack with firstBadLink 10.2.0.154:50010
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2963)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)


or:
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
        at org.apache.hadoop.io.Text.readString(Text.java:400)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2961)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)


Checking the datanode log I see hundreds of times this error:
2012-02-23 14:22:56,008 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block
for append blk_3368446040000470452_29464903
2012-02-23 14:22:56,008 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_3368446040000470452_29464903 received exception
java.net.SocketException: Too many open files
2012-02-23 14:22:56,008 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.2.0.156:50010,
storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketException: Too many open files
        at sun.nio.ch.Net.socket0(Native Method)
        at sun.nio.ch.Net.socket(Net.java:97)
        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
        at
sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)
2012-02-23 14:22:56,034 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-2698946892792040969_29464904 src: /10.2.0.156:40969 dest:
/10.2.0.156:50010
2012-02-23 14:22:56,035 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-2698946892792040969_29464904 received exception
java.net.SocketException: Too many open files
2012-02-23 14:22:56,035 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(10.2.0.156:50010,
storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketException: Too many open files
        at sun.nio.ch.Net.socket0(Native Method)
        at sun.nio.ch.Net.socket(Net.java:97)
        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
        at
sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
        at
org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)


I've always had configured in hdfs-site.xml:
        <property>
                <name>dfs.datanode.max.xcievers</name>
                <value>4096</value>
        </property>

But I think now it's not enough to handle that many multipleOutputs. If I
increase  even more max.xcievers which are de side effects? Wich value
should be considered as maximum (I suppose it depends on the CPU and RAM,
but aprox).

Thanks in advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/multioutput-dfs-datanode-max-xcievers-and-too-many-open-files-tp3770024p3770024.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Reply via email to