Hi, I've noticed some nodes in our cluster are dying after some period of time.
WARN [New I/O server boss #17] 2013-10-29 12:22:20,725 Slf4JLogger.java (line 76) Failed to accept a connection. java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) And other exceptions related to the same cause. Now, as we use the Cassandra package, the nofile limit is raised to 100000. To double check if this correct: root@de-cass09 ~ # cat /proc/18332/limits Limit Soft Limit Hard Limit Units ... Max open files 100000 100000 files ... Now I check how many files are open: root@de-cass09 ~ # lsof -n -p 18332 | wc -l 100038 This seems an awful a lot for size tiered compaction... ? Now I noticed when I checked the list, a (deleted) file passed a lot ... java 18332 cassandra 4704r REG 8,1 10911921661 2147483839 /data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted) java 18332 cassandra 4705r REG 8,1 10911921661 2147483839 /data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted) ... Actually, if I count specific for this file: root@de-cass09 ~ # lsof -n -p 18332 | grep mapdata040-hos-jb-7648-Data.db | wc -l 52707 Other nodes are around a total of 350 files open... Any idea why this nofiles is so high ? The first exceptions I see is this: WARN [New I/O worker #8] 2013-10-29 12:09:34,440 Slf4JLogger.java (line 76) Unexpected exception in the selector loop. java.lang.NullPointerException at sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:178) at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:227) at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:164) at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133) at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:209) at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:151) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Several minutes later I get Too many open files. Specs: 12-node cluster with Ubuntu 12.04 LTS, Cassandra 2.0.1 (datastax packages), using JBOD of 2 disks. JNA enabled. Any suggestions? Kind regards, Pieter Callewaert [Description: cid:image003.png@01CD9CE5.CE5A2330] Pieter Callewaert Web & IT engineer Web: www.be-mobile.be<http://www.be-mobile.be/> Email: pieter.callewa...@be-mobile.be<mailto:pieter.callewa...@be-mobile.be> Tel: + 32 9 330 51 80
<<inline: image001.png>>