[ 
https://issues.apache.org/jira/browse/CASSANDRA-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810590#comment-13810590
 ] 

Mikhail Stepura commented on CASSANDRA-6275:
--------------------------------------------

bq. Also, when that happens it's not always possible to shutdown server process 
via SIGTERM. Have to use SIGKILL.

As far as I understand here *what* is happening

* {{SIGTERM handler}} waits for {{StorageServiceShutdownHook}} 
* {{StorageServiceShutdownHook}} waits (up to *3600 sec == 1hr*) for 
{{mutationStage}} threads to complete. 
* {{"MutationStage:2718"}} thread performs 
{{ColumnFamilyStore.forceBlockingFlush}} initiated by 
{{TruncateVerbHandler.doVerb}} and waits for {{"MemtablePostFlusher:1"}} 
* {{"MemtablePostFlusher:1"}} is waiting on {{CountDownLatch.await}} (in 
{{WrappedRunnable}} returned from {{ColumnFamilyStore.switchMemtable)}}.  It 
will wait until the latch is counted down to zero.

There is also another call to {{ColumnFamilyStore.forceBlockingFlush}} from 
{{"OptionalTasks:1":BatchlogManager.cleanup()}} . 


> 2.0.x leaks file handles
> ------------------------
>
>                 Key: CASSANDRA-6275
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6275
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: java version "1.7.0_25"
> Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
> Linux cassandra-test1 2.6.32-279.el6.x86_64 #1 SMP Thu Jun 21 15:00:18 EDT 
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Mikhail Mazursky
>         Attachments: cassandra_jstack.txt, slog.gz
>
>
> Looks like C* is leaking file descriptors when doing lots of CAS operations.
> {noformat}
> $ sudo cat /proc/15455/limits
> Limit                     Soft Limit           Hard Limit           Units    
> Max cpu time              unlimited            unlimited            seconds  
> Max file size             unlimited            unlimited            bytes    
> Max data size             unlimited            unlimited            bytes    
> Max stack size            10485760             unlimited            bytes    
> Max core file size        0                    0                    bytes    
> Max resident set          unlimited            unlimited            bytes    
> Max processes             1024                 unlimited            processes
> Max open files            4096                 4096                 files    
> Max locked memory         unlimited            unlimited            bytes    
> Max address space         unlimited            unlimited            bytes    
> Max file locks            unlimited            unlimited            locks    
> Max pending signals       14633                14633                signals  
> Max msgqueue size         819200               819200               bytes    
> Max nice priority         0                    0                   
> Max realtime priority     0                    0                   
> Max realtime timeout      unlimited            unlimited            us 
> {noformat}
> Looks like the problem is not in limits.
> Before load test:
> {noformat}
> cassandra-test0 ~]$ lsof -n | grep java | wc -l
> 166
> cassandra-test1 ~]$ lsof -n | grep java | wc -l
> 164
> cassandra-test2 ~]$ lsof -n | grep java | wc -l
> 180
> {noformat}
> After load test:
> {noformat}
> cassandra-test0 ~]$ lsof -n | grep java | wc -l
> 967
> cassandra-test1 ~]$ lsof -n | grep java | wc -l
> 1766
> cassandra-test2 ~]$ lsof -n | grep java | wc -l
> 2578
> {noformat}
> Most opened files have names like:
> {noformat}
> java      16890 cassandra 1636r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1637r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1638r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1639r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1640r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1641r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1642r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1643r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1644r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1645r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1646r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1647r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1648r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1649r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1650r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1651r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1652r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1653r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1654r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java      16890 cassandra 1655r      REG             202,17 161158485     
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java      16890 cassandra 1656r      REG             202,17  88724987     
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> {noformat}
> Also, when that happens it's not always possible to shutdown server process 
> via SIGTERM. Have to use SIGKILL.
> p.s. See mailing thread for more context information 
> https://www.mail-archive.com/user@cassandra.apache.org/msg33035.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to