[
https://issues.apache.org/jira/browse/CASSANDRA-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825892#comment-13825892
]
graham sanderson commented on CASSANDRA-6275:
---------------------------------------------
Note also, that most if not all of the deleted files are of the form
{code}
java 14018 cassandra 586r REG 8,33 8792499 1251
/data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db
(deleted)
java 14018 cassandra 587r REG 8,33 27303760 1254
/data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db
(deleted)
java 14018 cassandra 588r REG 8,33 8792499 1251
/data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db
(deleted)
java 14018 cassandra 589r REG 8,33 27303760 1254
/data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db
(deleted)
java 14018 cassandra 590r REG 8,33 10507214 936
/data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4657-Data.db
(deleted)
{code}
We have 7 data disks (don't know if this contributes to the problem), and the
number of such deleted files is very ill balanced with 93% on two of the 7
disks (on this particular node)... the distribution of live data file size for
OpsCenter/rollups60 is a little uneven with the same data mounts that have more
deleted (but open) files having more actual live data, but the deleted file
counts per mount point vary by several order of magnitudes whereas the data
itself does not.
> 2.0.x leaks file handles
> ------------------------
>
> Key: CASSANDRA-6275
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6275
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: java version "1.7.0_25"
> Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
> Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)
> Linux cassandra-test1 2.6.32-279.el6.x86_64 #1 SMP Thu Jun 21 15:00:18 EDT
> 2012 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Mikhail Mazursky
> Attachments: c_file-descriptors_strace.tbz, cassandra_jstack.txt,
> leak.log, position_hints.tgz, slog.gz
>
>
> Looks like C* is leaking file descriptors when doing lots of CAS operations.
> {noformat}
> $ sudo cat /proc/15455/limits
> Limit Soft Limit Hard Limit Units
> Max cpu time unlimited unlimited seconds
> Max file size unlimited unlimited bytes
> Max data size unlimited unlimited bytes
> Max stack size 10485760 unlimited bytes
> Max core file size 0 0 bytes
> Max resident set unlimited unlimited bytes
> Max processes 1024 unlimited processes
> Max open files 4096 4096 files
> Max locked memory unlimited unlimited bytes
> Max address space unlimited unlimited bytes
> Max file locks unlimited unlimited locks
> Max pending signals 14633 14633 signals
> Max msgqueue size 819200 819200 bytes
> Max nice priority 0 0
> Max realtime priority 0 0
> Max realtime timeout unlimited unlimited us
> {noformat}
> Looks like the problem is not in limits.
> Before load test:
> {noformat}
> cassandra-test0 ~]$ lsof -n | grep java | wc -l
> 166
> cassandra-test1 ~]$ lsof -n | grep java | wc -l
> 164
> cassandra-test2 ~]$ lsof -n | grep java | wc -l
> 180
> {noformat}
> After load test:
> {noformat}
> cassandra-test0 ~]$ lsof -n | grep java | wc -l
> 967
> cassandra-test1 ~]$ lsof -n | grep java | wc -l
> 1766
> cassandra-test2 ~]$ lsof -n | grep java | wc -l
> 2578
> {noformat}
> Most opened files have names like:
> {noformat}
> java 16890 cassandra 1636r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1637r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1638r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1639r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1640r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1641r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1642r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1643r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1644r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1645r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1646r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1647r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1648r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1649r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1650r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1651r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1652r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1653r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1654r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> java 16890 cassandra 1655r REG 202,17 161158485
> 655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
> java 16890 cassandra 1656r REG 202,17 88724987
> 655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
> {noformat}
> Also, when that happens it's not always possible to shutdown server process
> via SIGTERM. Have to use SIGKILL.
> p.s. See mailing thread for more context information
> https://www.mail-archive.com/[email protected]/msg33035.html
--
This message was sent by Atlassian JIRA
(v6.1#6144)