Raghu, Apologies for the confusion. I was seeing the problem with any setting for dfs.datanode.max.xcievers... 1k, 2k and 8k. Likewise, I was also seeing the problem with different open file settings, all the way up to 32k.
Since I installed the patch, HDFS has been performing much better. The current settings that work for me are 16k max open files with dfs.datanode.max.xcievers=8k, though under heavy balancer load I do start to hit the 16k max. Regards, Sean 2009/2/13 Raghu Angadi <rang...@yahoo-inc.com> > Sean, > > A few things in your messages is not clear to me. Currently this is what I > make out of it : > > 1) with 1k limit, you do see the problem. > 2) with 16 limit - (?) not clear if you see the problem > 3) with 8k you don't see the problem > 3a) with or without the patch, I don't know. > > But if you do use the patch and things do improve, please let us know. > > Raghu. > > > Sean Knapp wrote: > >> Raghu, >> Thanks for the quick response. I've been beating up on the cluster for a >> while now and so far so good. I'm still at 8k... what should I expect to >> find with 16k versus 1k? The 8k didn't appear to be affecting things to >> begin with. >> >> Regards, >> Sean >> >> On Thu, Feb 12, 2009 at 2:07 PM, Raghu Angadi <rang...@yahoo-inc.com> >> wrote: >> >> You are most likely hit by >>> https://issues.apache.org/jira/browse/HADOOP-4346 . I hope it gets back >>> ported. There is a 0.18 patch posted there. >>> >>> btw, does 16k help in your case? >>> >>> Ideally 1k should be enough (with small number of clients). Please try >>> the >>> above patch with 1k limit. >>> >>> Raghu. >>> >>> >>> Sean Knapp wrote: >>> >>> Hi all, >>>> I'm continually running into the "Too many open files" error on 18.3: >>>> >>>> DataXceiveServer: java.io.IOException: Too many open files >>>> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) >>>> at >>>> >>>> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145) >>>>> >>>>> at >>>>> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:96) >>>>> >>>>> at >>>>> org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:997) >>>>> >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> >>>> >>>> I'm writing thousands of files in the course of a few minutes, but >>>> nothing >>>> that seems too unreasonable, especially given the numbers below. I begin >>>> getting a surge of these warnings right as I hit 1024 files open by the >>>> DataNode: >>>> >>>> had...@u10:~$ ps ux | awk '/dfs\.DataNode/ { print $2 }' | xargs -i ls >>>> >>>> /proc/{}/fd | wc -l >>>>> >>>>> 1023 >>>>> >>>> >>>> This is a bit unexpected, however, since I've configured my open file >>>> limit >>>> to be 16k: >>>> >>>> had...@u10:~$ ulimit -a >>>> core file size (blocks, -c) 0 >>>> data seg size (kbytes, -d) unlimited >>>> scheduling priority (-e) 0 >>>> file size (blocks, -f) unlimited >>>> pending signals (-i) 268288 >>>> max locked memory (kbytes, -l) 32 >>>> max memory size (kbytes, -m) unlimited >>>> open files (-n) 16384 >>>> pipe size (512 bytes, -p) 8 >>>> POSIX message queues (bytes, -q) 819200 >>>> real-time priority (-r) 0 >>>> stack size (kbytes, -s) 8192 >>>> cpu time (seconds, -t) unlimited >>>> max user processes (-u) 268288 >>>> virtual memory (kbytes, -v) unlimited >>>> file locks (-x) unlimited >>>> >>>> >>>> Note, I've also set dfs.datanode.max.xcievers to 8192 in >>>> hadoop-site.xml. >>>> >>>> Thanks in advance, >>>> Sean >>>> >>>> >>>> >> >