Re: Really need some advices on large data considerations

Aaron Morton Mon, 12 May 2014 03:21:29 -0700

> We've learned that compaction strategy would be an important point cause 
> we've ran into 'no space' trouble because of the 'sized tiered'  compaction 
> strategy.
If you want to get the most out of the raw disk space LCS is the way to go, 
remember it uses approximately twice the disk IO.


> From our experience changing any settings/schema during a large cluster is on 
> line and has been running for some time is really really a pain.
Which parts in particular ? 

Updating the schema or config ? OpsCentre has a rolling restart feature which 
can be handy when chef / puppet is deploying the config changes. Schema / 
gossip can take a little to propagate with high number of nodes. 
 
On a modern version you should be able to run 2 to 3 TB per node, maybe higher. 
The biggest concerns are going to be repair (the changes in 2.1 will help) and 
bootstrapping. I’d recommend testing a smaller cluster, say 12 nodes, with a 
high load per node 3TB. 

cheers
Aaron
 
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/05/2014, at 12:09 pm, Yatong Zhang <bluefl...@gmail.com> wrote:

> Hi,
> 
> We're going to deploy a large Cassandra cluster in PB level. Our scenario 
> would be:
> 
> 1. Lots of writes, about 150 writes/second at average, and about 300K size 
> per write.
> 2. Relatively very small reads
> 3. Our data will be never updated
> 4. But we will delete old data periodically to free space for new data
> 
> We've learned that compaction strategy would be an important point cause 
> we've ran into 'no space' trouble because of the 'sized tiered'  compaction 
> strategy.
> 
> We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderations and is 
> this enough or update-to-date? From our experience changing any 
> settings/schema during a large cluster is on line and has been running for 
> some time is really really a pain. So we're gathering more info and expecting 
> some more practical suggestions before we set up  the cassandra cluster. 
> 
> Thanks and any help is of great appreciation

Re: Really need some advices on large data considerations

Reply via email to