[ https://issues.apache.org/jira/browse/CASSANDRA-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810590#comment-13810590 ]
Mikhail Stepura commented on CASSANDRA-6275: -------------------------------------------- bq. Also, when that happens it's not always possible to shutdown server process via SIGTERM. Have to use SIGKILL. As far as I understand here *what* is happening * {{SIGTERM handler}} waits for {{StorageServiceShutdownHook}} * {{StorageServiceShutdownHook}} waits (up to *3600 sec == 1hr*) for {{mutationStage}} threads to complete. * {{"MutationStage:2718"}} thread performs {{ColumnFamilyStore.forceBlockingFlush}} initiated by {{TruncateVerbHandler.doVerb}} and waits for {{"MemtablePostFlusher:1"}} * {{"MemtablePostFlusher:1"}} is waiting on {{CountDownLatch.await}} (in {{WrappedRunnable}} returned from {{ColumnFamilyStore.switchMemtable)}}. It will wait until the latch is counted down to zero. There is also another call to {{ColumnFamilyStore.forceBlockingFlush}} from {{"OptionalTasks:1":BatchlogManager.cleanup()}} . > 2.0.x leaks file handles > ------------------------ > > Key: CASSANDRA-6275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6275 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: java version "1.7.0_25" > Java(TM) SE Runtime Environment (build 1.7.0_25-b15) > Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode) > Linux cassandra-test1 2.6.32-279.el6.x86_64 #1 SMP Thu Jun 21 15:00:18 EDT > 2012 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Mikhail Mazursky > Attachments: cassandra_jstack.txt, slog.gz > > > Looks like C* is leaking file descriptors when doing lots of CAS operations. > {noformat} > $ sudo cat /proc/15455/limits > Limit Soft Limit Hard Limit Units > Max cpu time unlimited unlimited seconds > Max file size unlimited unlimited bytes > Max data size unlimited unlimited bytes > Max stack size 10485760 unlimited bytes > Max core file size 0 0 bytes > Max resident set unlimited unlimited bytes > Max processes 1024 unlimited processes > Max open files 4096 4096 files > Max locked memory unlimited unlimited bytes > Max address space unlimited unlimited bytes > Max file locks unlimited unlimited locks > Max pending signals 14633 14633 signals > Max msgqueue size 819200 819200 bytes > Max nice priority 0 0 > Max realtime priority 0 0 > Max realtime timeout unlimited unlimited us > {noformat} > Looks like the problem is not in limits. > Before load test: > {noformat} > cassandra-test0 ~]$ lsof -n | grep java | wc -l > 166 > cassandra-test1 ~]$ lsof -n | grep java | wc -l > 164 > cassandra-test2 ~]$ lsof -n | grep java | wc -l > 180 > {noformat} > After load test: > {noformat} > cassandra-test0 ~]$ lsof -n | grep java | wc -l > 967 > cassandra-test1 ~]$ lsof -n | grep java | wc -l > 1766 > cassandra-test2 ~]$ lsof -n | grep java | wc -l > 2578 > {noformat} > Most opened files have names like: > {noformat} > java 16890 cassandra 1636r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1637r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1638r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1639r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1640r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1641r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1642r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1643r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1644r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1645r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1646r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1647r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1648r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1649r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1650r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1651r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1652r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1653r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1654r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > java 16890 cassandra 1655r REG 202,17 161158485 > 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db > java 16890 cassandra 1656r REG 202,17 88724987 > 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db > {noformat} > Also, when that happens it's not always possible to shutdown server process > via SIGTERM. Have to use SIGKILL. > p.s. See mailing thread for more context information > https://www.mail-archive.com/user@cassandra.apache.org/msg33035.html -- This message was sent by Atlassian JIRA (v6.1#6144)