Re: cassandra hit a wall: Too many open files (98567!)
Ah, that explains part of the problem indeed. The whole situation still doesn't make a lot of sense to me, unless the answer is that the default sstable size with level compaction is just no good for large datasets. I restarted cassandra a few hours ago and it had to open about 32k files at start-up. Took about 15 minutes. That just can't be good... I also noticed that when using compression the sstable size specified is uncompressed, so the actual files tend to be smaller. I now upped the sstable size to 100MB, which should result in about 40MB files in my case. Is there a way I can compact some of the existing sstables that are small? For example, I have a level-4 sstable that is 56KB in size and many more that are rather small. Does nodetool compact do anything with level compaction? On 1/18/2012 2:39 AM, Janne Jalkanen wrote: 1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason? https://issues.apache.org/jira/browse/CASSANDRA-3616 /Janne On Jan 18, 2012, at 03:52 , dir dir wrote: Very Interesting Why you open so many file? Actually what kind of system that is built by you until open so many files? would you tell us? Thanks... On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.com mailto:t...@rightscale.com wrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20 tel:2012-01-12%2020:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?
Re: cassandra hit a wall: Too many open files (98567!)
On Fri, Jan 13, 2012 at 8:01 PM, Thorsten von Eicken t...@rightscale.com wrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? No, with leveled compaction, the (max) size of sstables is fixed whatever the generation is (the default is 5MB, but it's 5MB of uncompressed data (we may change that though) so 3MB sound about right). What changes between generations is the number of sstables it can contain. Gen 1 can have 10 sstables (it can have more but only temporarily), Gen 2 can have 100, Gen 3 can have 1000 etc.. So again, that most sstables are in gen 3 and 4 is expected too. This situation can't be healthy, can it? Suggestions? Leveled compaction uses lots of files (the number is proportional to the amount of data). It is not necessarily a big problem as modern OS deal wit big amount of open files fairly well (as far as I know at least). I would just up the file descriptor ulimit and not worry too much about it, unless you have reasons to believe that it's an actual descriptor leak (but given the number of files you have, the number of open ones doesn't seem off so I don't think there is one here) or that this has performance impacts. -- Sylvain
Re: cassandra hit a wall: Too many open files (98567!)
1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason? https://issues.apache.org/jira/browse/CASSANDRA-3616 /Janne On Jan 18, 2012, at 03:52 , dir dir wrote: Very Interesting Why you open so many file? Actually what kind of system that is built by you until open so many files? would you tell us? Thanks... On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.com wrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?
Re: cassandra hit a wall: Too many open files (98567!)
Very Interesting Why you open so many file? Actually what kind of system that is built by you until open so many files? would you tell us? Thanks... On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.comwrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?
Re: cassandra hit a wall: Too many open files (98567!)
That sounds like to many sstables. Out of interest were you using multi threaded compaction ? Just wondering about this https://issues.apache.org/jira/browse/CASSANDRA-3711 Can you set the file handles to unlimited ? Can you provide some more info what your see in the data dir incase it is a bug in leveled compaction. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/01/2012, at 8:01 AM, Thorsten von Eicken wrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?