Re: Why so many SSTables?

2012-04-12 Thread Romain HARDOUIN
I've just opened a new JIRA: CASSANDRA-4142 I've double checked numbers, 7747 seems to be array list object's capacity (Eclipse Memory Analyzer displays java.lang.Object[7747] @ 0x7d3f3f798). Actually there are 5757 browsable entries in EMA therefore each object is about 140 KB (size varies

Re: Why so many SSTables?

2012-04-12 Thread Thorsten von Eicken
From my experience I would strongly advise against leveled compaction for your use-case. But you should certainly test and see for yourself! I have ~1TB on a node with ~13GB of heap. I ended up with 30k SSTables. I raised the SSTable size to 100MB but that didn't prove to be sufficient and I did

Re: Why so many SSTables?

2012-04-11 Thread Romain HARDOUIN
Thank you for your answers. I originally post this question because we encoutered an OOM Exception on 2 nodes during repair session. Memory analyzing shows an hotspot: an ArrayList of SSTableBoundedScanner which contains as many objects there are SSTables on disk (7747 objects at the time).

Re: Why so many SSTables?

2012-04-11 Thread Sylvain Lebresne
On Wed, Apr 11, 2012 at 2:43 PM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: Thank you for your answers. I originally post this question because we encoutered an OOM Exception on 2 nodes during repair session. Memory analyzing shows an hotspot: an ArrayList of SSTableBoundedScanner

Re: Why so many SSTables?

2012-04-11 Thread Dave Brosius
It's easy to spend other people's money, but handling 1TB of data with 1.5 g heap? Memory is cheap, and just a little more will solve many problems. On 04/11/2012 08:43 AM, Romain HARDOUIN wrote: Thank you for your answers. I originally post this question because we encoutered an OOM

Re: Why so many SSTables?

2012-04-11 Thread aaron morton
In general I would limit the data load per node to 300 to 400GB. Otherwise things can painful when it comes time to run compaction / repair / move . Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/04/2012, at 1:00 AM, Dave Brosius

Re: Why so many SSTables?

2012-04-11 Thread Ben Coverston
In general I would limit the data load per node to 300 to 400GB. Otherwise things can painful when it comes time to run compaction / repair / move . +1 on more nodes of moderate size

Re: Why so many SSTables?

2012-04-11 Thread Watanabe Maki
If you increase sstable_size_in_mb to 200MB, you will need more IO for each compaction. For example, if your memtable will be flushed, and LCS needs to compact it with 10 overwrapped L1 sstables, you will need almost 2GB read and 2GB write for the single compaction. From iPhone On

Why so many SSTables?

2012-04-10 Thread Romain HARDOUIN
Hi, We are surprised by the number of files generated by Cassandra. Our cluster consists of 9 nodes and each node handles about 35 GB. We're using Cassandra 1.0.6 with LeveledCompactionStrategy. We have 30 CF. We've got roughly 45,000 files under the keyspace directory on each node: ls -l

Re: Why so many SSTables?

2012-04-10 Thread Jonathan Ellis
LCS explicitly tries to keep sstables under 5MB to minimize extra work done by compacting data that didn't really overlap across different levels. On Tue, Apr 10, 2012 at 9:24 AM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: Hi, We are surprised by the number of files generated by

Re: Why so many SSTables?

2012-04-10 Thread Maki Watanabe
You can configure sstable size by sstable_size_in_mb parameter for LCS. The default value is 5MB. You should better to check you don't have many pending compaction tasks with nodetool tpstats and compactionstats also. If you have enough IO throughput, you can increase