Hi,

I've noticed some nodes in our cluster are dying after some period of time.

WARN [New I/O server boss #17] 2013-10-29 12:22:20,725 Slf4JLogger.java (line 
76) Failed to accept a connection.
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
        at 
org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at 
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

And other exceptions related to the same cause.
Now, as we use the Cassandra package, the nofile limit is raised to 100000.
To double check if this correct:

root@de-cass09 ~ # cat /proc/18332/limits
Limit                     Soft Limit           Hard Limit           Units
...
Max open files            100000               100000               files
...

Now I check how many files are open:
root@de-cass09 ~ # lsof -n -p 18332 | wc -l
100038

This seems an awful a lot for size tiered compaction... ?
Now I noticed when I checked the list, a (deleted) file passed a lot

...
java    18332 cassandra 4704r   REG                8,1  10911921661 2147483839 
/data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted)
java    18332 cassandra 4705r   REG                8,1  10911921661 2147483839 
/data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted)
...

Actually, if I count specific for this file:
root@de-cass09 ~ # lsof -n -p 18332 | grep mapdata040-hos-jb-7648-Data.db | wc 
-l
52707

Other nodes are around a total of 350 files open... Any idea why this nofiles 
is so high ?

The first exceptions I see is this:
WARN [New I/O worker #8] 2013-10-29 12:09:34,440 Slf4JLogger.java (line 76) 
Unexpected exception in the selector loop.
java.lang.NullPointerException
        at 
sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:178)
        at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:227)
        at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:164)
        at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133)
        at 
java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:209)
        at 
org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:151)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:724)

Several minutes later I get Too many open files.

Specs:
12-node cluster with Ubuntu 12.04 LTS, Cassandra 2.0.1 (datastax packages), 
using JBOD of 2 disks.
JNA enabled.

Any suggestions?

Kind regards,
Pieter Callewaert

[Description: cid:image003.png@01CD9CE5.CE5A2330]

   Pieter Callewaert
   Web & IT engineer

   Web:   www.be-mobile.be<http://www.be-mobile.be/>
   Email: pieter.callewa...@be-mobile.be<mailto:pieter.callewa...@be-mobile.be>
   Tel:  + 32 9 330 51 80



<<inline: image001.png>>

Reply via email to