Re: Understand how to provision nodes and use cassandra in the production

Jonathan Ellis Sat, 30 Jan 2010 05:59:31 -0800

the thing that will help most in 0.5 is to increase your
KeysCachedFraction to 0.2 or even more, depending on your workload.


On Sat, Jan 30, 2010 at 5:23 AM, Suhail Doshi <digitalwarf...@gmail.com> wrote:
> An issue I've been seeing is it's really hard to scale Cassandra with reads.
> I've run top, vmstat, iostat. vmstat shows no swapping but iostat shows
> heavy saturation of %util and await times over 90ms with max rMB/s of 7-8.
>
> I have over 7G of memory dedicated across two nodes. I am wondering what the
> issue might be and how to solve this? I felt like 7 G would be enough.
>
> Suhail
>
> On Thu, Jan 28, 2010 at 7:32 PM, Ray Slakinski <r...@mahalo.com> wrote:
>
>> Cassandra auto shards, so you just need to point at your cluster and
>> cassandra does the rest. You should read up on different partitioners though
>> before you go live in production, because its not too easy to switch once
>> you make that decision.
>>
>> http://wiki.apache.org/cassandra/StorageConfiguration#Partitioner
>>
>> Ray Slakinski
>> On 2010-01-28, at 7:29 PM, Suhail Doshi wrote:
>>
>> > Another piece I am interested in is how cassandra distributes the data
>> > automatically. In MySQL you need to shard and you'd pick the shard to
>> > request info from--how does that translate in cassandra?
>> >
>> > On Thu, Jan 28, 2010 at 7:23 PM, Suhail Doshi <suh...@mixpanel.com>
>> wrote:
>> >
>> >> We've started to use Cassandra in production and just have one node
>> right
>> >> now. Here's one of our ColumnFamilys:
>> >>
>> >> 16G Jan 28 22:28 SomeIndex-5467-Index.db
>> >> 196M Jan 28 22:32 SomeIndex-5487-Index.db
>> >>
>> >> The first bottle neck you encounter is reads--writes are extremely fast
>> even with one node.
>> >>
>> >> My question is, is the size of the *-Index.db files the amount of RAM
>> you need available for Cassandra to do reads fast?
>> >>
>> >> What are some configuration options you would need to tweak besides the
>> JVM's max memory size being larger. Is there any default configurations
>> commonly missed?
>> >>
>> >> Next, if you provision more nodes will Cassandra distribute the data in
>> memory so I don't need a single 16 GB node? Is there anything I need to
>> build in my application logic to make this work correctly. Ideally, if I had
>> a 16 GB index, I'd want it spread across 4 4GB nodes. Can any client connect
>> to any one node request info and it will get the info back from a node that
>> has that part of the index in memory?
>> >>
>> >> What's the best way to do efficient reads?
>> >>
>> >> Suhail
>> >>
>> >>
>>
>>
>
>
> --
> http://mixpanel.com
> Blog: http://blog.mixpanel.com
>

Re: Understand how to provision nodes and use cassandra in the production

Reply via email to