Jackson Chung created CASSANDRA-8874:
----------------------------------------
Summary: running out of FD, and causing clients hang when dropping
a keyspace with many CF with many sstables
Key: CASSANDRA-8874
URL: https://issues.apache.org/jira/browse/CASSANDRA-8874
Project: Cassandra
Issue Type: Bug
Reporter: Jackson Chung
we already set number of file descriptors to 100000 for c* usage, and confirmed
that from /proc/$cass_pid/limits
we have 16 nodes, 2 DC, each node stores about 600GB to 1TB data; ec2, i2-2xl
instances, raid0 the 2 disks
we use both hector and datastax drivers, and there are many clients connecting
to the cluster.
1 day we dropped a keyspace (that our app no longer use), which has a good
amount of CFs, with some of them use leveledbcompaction and have some good
amount of sstables... and our app went down. CPU/load avg were high and we
couldn't even ssh to them. We have to force a reboot, and restart 2 of the C*,
that was filled (hundreds of thousands) of errors of "too many open files"
C* 2.0.11
{noformat}$ grep -ic "caused by.*too many open file" system.log.*
system.log.1:0
system.log.10:18659
system.log.11:17539
system.log.12:18941
system.log.13:18936
system.log.14:18601
system.log.15:18933
system.log.16:18937
system.log.17:18954
system.log.18:18892
system.log.19:18942
system.log.2:0
system.log.20:18977
system.log.21:18977
system.log.22:18852
system.log.23:18978
system.log.24:18978
system.log.25:18978
system.log.26:18978
system.log.27:18978
system.log.28:18978
system.log.29:18978
system.log.3:654
system.log.30:18978
system.log.31:18978
system.log.32:18978
system.log.33:18977
system.log.34:18978
system.log.35:18978
system.log.36:17943
system.log.37:18867
system.log.38:15082
system.log.39:17766
system.log.4:17932
system.log.40:18029
system.log.41:18890
system.log.42:18048
system.log.43:18812
system.log.44:18787
system.log.45:18962
system.log.46:18978
system.log.47:18978
system.log.48:18978
system.log.49:18978
system.log.5:15284
system.log.50:18978
system.log.6:17180
system.log.7:17286
system.log.8:18651
system.log.9:17720
{noformat}
all the logs are from that day..
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)