Re: Understand how to provision nodes and use cassandra in the production

Suhail Doshi Sat, 30 Jan 2010 03:38:12 -0800

Additionally:

Address       Status     Load          Range
     Ring


85079825064071324593650466313420553448
<seed_ip>Up         58.08 GB      25804699734015282125022172898213238764
|<--|
<non_seed>Up         19.71 GB      85079825064071324593650466313420553448
  |-->|

On Sat, Jan 30, 2010 at 3:32 AM, Suhail Doshi <digitalwarf...@gmail.com>wrote:

> I should note the new node has been bootstrapped and the data has been
> distributed which further perplexes me.
>
> The index file I am reading off is about 16G
>
>
> On Sat, Jan 30, 2010 at 3:23 AM, Suhail Doshi <digitalwarf...@gmail.com>wrote:
>
>> An issue I've been seeing is it's really hard to scale Cassandra with
>> reads. I've run top, vmstat, iostat. vmstat shows no swapping but iostat
>> shows heavy saturation of %util and await times over 90ms with max rMB/s of
>> 7-8.
>>
>> I have over 7G of memory dedicated across two nodes. I am wondering what
>> the issue might be and how to solve this? I felt like 7 G would be enough.
>>
>> Suhail
>>
>>
>> On Thu, Jan 28, 2010 at 7:32 PM, Ray Slakinski <r...@mahalo.com> wrote:
>>
>>> Cassandra auto shards, so you just need to point at your cluster and
>>> cassandra does the rest. You should read up on different partitioners though
>>> before you go live in production, because its not too easy to switch once
>>> you make that decision.
>>>
>>> http://wiki.apache.org/cassandra/StorageConfiguration#Partitioner
>>>
>>> Ray Slakinski
>>> On 2010-01-28, at 7:29 PM, Suhail Doshi wrote:
>>>
>>> > Another piece I am interested in is how cassandra distributes the data
>>> > automatically. In MySQL you need to shard and you'd pick the shard to
>>> > request info from--how does that translate in cassandra?
>>> >
>>> > On Thu, Jan 28, 2010 at 7:23 PM, Suhail Doshi <suh...@mixpanel.com>
>>> wrote:
>>> >
>>> >> We've started to use Cassandra in production and just have one node
>>> right
>>> >> now. Here's one of our ColumnFamilys:
>>> >>
>>> >> 16G Jan 28 22:28 SomeIndex-5467-Index.db
>>> >> 196M Jan 28 22:32 SomeIndex-5487-Index.db
>>> >>
>>> >> The first bottle neck you encounter is reads--writes are extremely
>>> fast even with one node.
>>> >>
>>> >> My question is, is the size of the *-Index.db files the amount of RAM
>>> you need available for Cassandra to do reads fast?
>>> >>
>>> >> What are some configuration options you would need to tweak besides
>>> the JVM's max memory size being larger. Is there any default configurations
>>> commonly missed?
>>> >>
>>> >> Next, if you provision more nodes will Cassandra distribute the data
>>> in memory so I don't need a single 16 GB node? Is there anything I need to
>>> build in my application logic to make this work correctly. Ideally, if I had
>>> a 16 GB index, I'd want it spread across 4 4GB nodes. Can any client connect
>>> to any one node request info and it will get the info back from a node that
>>> has that part of the index in memory?
>>> >>
>>> >> What's the best way to do efficient reads?
>>> >>
>>> >> Suhail
>>> >>
>>> >>
>>>
>>>
>>
>>
>> --
>> http://mixpanel.com
>> Blog: http://blog.mixpanel.com
>>
>
>
>
> --
> http://mixpanel.com
> Blog: http://blog.mixpanel.com
>



-- 
http://mixpanel.com
Blog: http://blog.mixpanel.com

Re: Understand how to provision nodes and use cassandra in the production

Reply via email to