Re: cassandra hit a wall: Too many open files (98567!)

2012-01-19 Thread Thorsten von Eicken
Ah, that explains part of the problem indeed. The whole situation still
doesn't make a lot of sense to me, unless the answer is that the default
sstable size with level compaction is just no good for large datasets. I
restarted cassandra a few hours ago and it had to open about 32k files
at start-up. Took about 15 minutes. That just can't be good...

I also noticed that when using compression the sstable size specified is
uncompressed, so the actual files tend to be smaller. I now upped the
sstable size to 100MB, which should result in about 40MB files in my
case. Is there a way I can compact some of the existing sstables that
are small? For example, I have a level-4 sstable that is 56KB in size
and many more that are rather small. Does nodetool compact do anything
with level compaction?

On 1/18/2012 2:39 AM, Janne Jalkanen wrote:

 1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason?

 https://issues.apache.org/jira/browse/CASSANDRA-3616

 /Janne

 On Jan 18, 2012, at 03:52 , dir dir wrote:

 Very Interesting Why you open so many file? Actually what kind of
 system that is built by you until open so many files? would you tell us?
 Thanks...


 On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken
 t...@rightscale.com mailto:t...@rightscale.com wrote:

 I'm running a single node cassandra 1.0.6 server which hit a wall
 yesterday:

 ERROR [CompactionExecutor:2918] 2012-01-12 20
 tel:2012-01-12%2020:37:06,327
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread
 Thread[CompactionExecutor:2918,1,main] java.io.IOError:
 java.io.FileNotFoundException:
 /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db
 (Too many
 open files in system)

 After that it stopped working and just say there with this error
 (undestandable). I did an lsof and saw that it had 98567 open files,
 yikes! An ls in the data directory shows 234011 files. After
 restarting
 it spent about 5 hours compacting, then quieted down. About 173k
 files
 left in the data directory. I'm using leveldb (with compression). I
 looked into the json of the two large CFs and gen 0 is empty, most
 sstables are gen 3  4. I have a total of about 150GB of data
 (compressed). Almost all the SStables are around 3MB in size. Aren't
 they supposed to get 10x bigger at higher gen's?

 This situation can't be healthy, can it? Suggestions?





Re: cassandra hit a wall: Too many open files (98567!)

2012-01-18 Thread Sylvain Lebresne
On Fri, Jan 13, 2012 at 8:01 PM, Thorsten von Eicken t...@rightscale.com 
wrote:
 I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:

 ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread
 Thread[CompactionExecutor:2918,1,main] java.io.IOError:
 java.io.FileNotFoundException:
 /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
 open files in system)

 After that it stopped working and just say there with this error
 (undestandable). I did an lsof and saw that it had 98567 open files,
 yikes! An ls in the data directory shows 234011 files. After restarting
 it spent about 5 hours compacting, then quieted down. About 173k files
 left in the data directory. I'm using leveldb (with compression). I
 looked into the json of the two large CFs and gen 0 is empty, most
 sstables are gen 3  4. I have a total of about 150GB of data
 (compressed). Almost all the SStables are around 3MB in size. Aren't
 they supposed to get 10x bigger at higher gen's?

No, with leveled compaction, the (max) size of sstables is fixed
whatever the generation is (the default is 5MB, but it's 5MB of
uncompressed data (we may change that though) so 3MB sound about
right).
What changes between generations is the number of sstables it can
contain. Gen 1 can have 10 sstables (it can have more but only
temporarily), Gen 2 can have 100, Gen 3 can have 1000 etc.. So again,
that most sstables are in gen 3 and 4 is expected too.

 This situation can't be healthy, can it? Suggestions?

Leveled compaction uses lots of files (the number is proportional to
the amount of data). It is not necessarily a big problem as modern OS
deal wit big amount of open files fairly well (as far as I know at
least). I would just up the file descriptor ulimit and not worry too
much about it, unless you have reasons to believe that it's an actual
descriptor leak (but given the number of files you have, the number of
open ones doesn't seem off so I don't think there is one here) or that
this has performance impacts.

--
Sylvain


Re: cassandra hit a wall: Too many open files (98567!)

2012-01-18 Thread Janne Jalkanen

1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason?

https://issues.apache.org/jira/browse/CASSANDRA-3616

/Janne

On Jan 18, 2012, at 03:52 , dir dir wrote:

 Very Interesting Why you open so many file? Actually what kind of
 system that is built by you until open so many files? would you tell us?
 Thanks...
 
 
 On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.com 
 wrote:
 I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
 
 ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread
 Thread[CompactionExecutor:2918,1,main] java.io.IOError:
 java.io.FileNotFoundException:
 /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
 open files in system)
 
 After that it stopped working and just say there with this error
 (undestandable). I did an lsof and saw that it had 98567 open files,
 yikes! An ls in the data directory shows 234011 files. After restarting
 it spent about 5 hours compacting, then quieted down. About 173k files
 left in the data directory. I'm using leveldb (with compression). I
 looked into the json of the two large CFs and gen 0 is empty, most
 sstables are gen 3  4. I have a total of about 150GB of data
 (compressed). Almost all the SStables are around 3MB in size. Aren't
 they supposed to get 10x bigger at higher gen's?
 
 This situation can't be healthy, can it? Suggestions?
 



Re: cassandra hit a wall: Too many open files (98567!)

2012-01-17 Thread dir dir
Very Interesting Why you open so many file? Actually what kind of
system that is built by you until open so many files? would you tell us?
Thanks...


On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.comwrote:

 I'm running a single node cassandra 1.0.6 server which hit a wall
 yesterday:

 ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread
 Thread[CompactionExecutor:2918,1,main] java.io.IOError:
 java.io.FileNotFoundException:
 /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
 open files in system)

 After that it stopped working and just say there with this error
 (undestandable). I did an lsof and saw that it had 98567 open files,
 yikes! An ls in the data directory shows 234011 files. After restarting
 it spent about 5 hours compacting, then quieted down. About 173k files
 left in the data directory. I'm using leveldb (with compression). I
 looked into the json of the two large CFs and gen 0 is empty, most
 sstables are gen 3  4. I have a total of about 150GB of data
 (compressed). Almost all the SStables are around 3MB in size. Aren't
 they supposed to get 10x bigger at higher gen's?

 This situation can't be healthy, can it? Suggestions?



Re: cassandra hit a wall: Too many open files (98567!)

2012-01-15 Thread aaron morton
That sounds like to many sstables. 

Out of interest were you using multi threaded compaction ? Just wondering about 
this 
https://issues.apache.org/jira/browse/CASSANDRA-3711

Can you set the file handles to unlimited ? 

Can you provide some more info what your see in the data dir incase it is a bug 
in leveled compaction. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/01/2012, at 8:01 AM, Thorsten von Eicken wrote:

 I'm running a single node cassandra 1.0.6 server which hit a wall yesterday:
 
 ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread
 Thread[CompactionExecutor:2918,1,main] java.io.IOError:
 java.io.FileNotFoundException:
 /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many
 open files in system)
 
 After that it stopped working and just say there with this error
 (undestandable). I did an lsof and saw that it had 98567 open files,
 yikes! An ls in the data directory shows 234011 files. After restarting
 it spent about 5 hours compacting, then quieted down. About 173k files
 left in the data directory. I'm using leveldb (with compression). I
 looked into the json of the two large CFs and gen 0 is empty, most
 sstables are gen 3  4. I have a total of about 150GB of data
 (compressed). Almost all the SStables are around 3MB in size. Aren't
 they supposed to get 10x bigger at higher gen's?
 
 This situation can't be healthy, can it? Suggestions?