Hey there, I've been running a cluster for about a year (about 20 machines). I've run many concurrent jobs there and some of them with multiOutput and never had any problem (multiOutputs where creating just 3 or 4 different outputs). Now I've a job with multiOutputs that creates 100 different outputs and it always end up with errors. Tasks start throwing this erros:
java.io.IOException: Bad connect ack with firstBadLink 10.2.0.154:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2963) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329) or: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:250) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2961) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329) Checking the datanode log I see hundreds of times this error: 2012-02-23 14:22:56,008 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block for append blk_3368446040000470452_29464903 2012-02-23 14:22:56,008 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_3368446040000470452_29464903 received exception java.net.SocketException: Too many open files 2012-02-23 14:22:56,008 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.2.0.156:50010, storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketException: Too many open files at sun.nio.ch.Net.socket0(Native Method) at sun.nio.ch.Net.socket(Net.java:97) at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84) at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37) at java.nio.channels.SocketChannel.open(SocketChannel.java:105) at org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118) 2012-02-23 14:22:56,034 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-2698946892792040969_29464904 src: /10.2.0.156:40969 dest: /10.2.0.156:50010 2012-02-23 14:22:56,035 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-2698946892792040969_29464904 received exception java.net.SocketException: Too many open files 2012-02-23 14:22:56,035 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.2.0.156:50010, storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketException: Too many open files at sun.nio.ch.Net.socket0(Native Method) at sun.nio.ch.Net.socket(Net.java:97) at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84) at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37) at java.nio.channels.SocketChannel.open(SocketChannel.java:105) at org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118) I've always had configured in hdfs-site.xml: <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> But I think now it's not enough to handle that many multipleOutputs. If I increase even more max.xcievers which are de side effects? Wich value should be considered as maximum (I suppose it depends on the CPU and RAM, but aprox). Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/multioutput-dfs-datanode-max-xcievers-and-too-many-open-files-tp3770024p3770024.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.