Additionally: Address Status Load Range Ring
85079825064071324593650466313420553448 <seed_ip>Up 58.08 GB 25804699734015282125022172898213238764 |<--| <non_seed>Up 19.71 GB 85079825064071324593650466313420553448 |-->| On Sat, Jan 30, 2010 at 3:32 AM, Suhail Doshi <digitalwarf...@gmail.com>wrote: > I should note the new node has been bootstrapped and the data has been > distributed which further perplexes me. > > The index file I am reading off is about 16G > > > On Sat, Jan 30, 2010 at 3:23 AM, Suhail Doshi <digitalwarf...@gmail.com>wrote: > >> An issue I've been seeing is it's really hard to scale Cassandra with >> reads. I've run top, vmstat, iostat. vmstat shows no swapping but iostat >> shows heavy saturation of %util and await times over 90ms with max rMB/s of >> 7-8. >> >> I have over 7G of memory dedicated across two nodes. I am wondering what >> the issue might be and how to solve this? I felt like 7 G would be enough. >> >> Suhail >> >> >> On Thu, Jan 28, 2010 at 7:32 PM, Ray Slakinski <r...@mahalo.com> wrote: >> >>> Cassandra auto shards, so you just need to point at your cluster and >>> cassandra does the rest. You should read up on different partitioners though >>> before you go live in production, because its not too easy to switch once >>> you make that decision. >>> >>> http://wiki.apache.org/cassandra/StorageConfiguration#Partitioner >>> >>> Ray Slakinski >>> On 2010-01-28, at 7:29 PM, Suhail Doshi wrote: >>> >>> > Another piece I am interested in is how cassandra distributes the data >>> > automatically. In MySQL you need to shard and you'd pick the shard to >>> > request info from--how does that translate in cassandra? >>> > >>> > On Thu, Jan 28, 2010 at 7:23 PM, Suhail Doshi <suh...@mixpanel.com> >>> wrote: >>> > >>> >> We've started to use Cassandra in production and just have one node >>> right >>> >> now. Here's one of our ColumnFamilys: >>> >> >>> >> 16G Jan 28 22:28 SomeIndex-5467-Index.db >>> >> 196M Jan 28 22:32 SomeIndex-5487-Index.db >>> >> >>> >> The first bottle neck you encounter is reads--writes are extremely >>> fast even with one node. >>> >> >>> >> My question is, is the size of the *-Index.db files the amount of RAM >>> you need available for Cassandra to do reads fast? >>> >> >>> >> What are some configuration options you would need to tweak besides >>> the JVM's max memory size being larger. Is there any default configurations >>> commonly missed? >>> >> >>> >> Next, if you provision more nodes will Cassandra distribute the data >>> in memory so I don't need a single 16 GB node? Is there anything I need to >>> build in my application logic to make this work correctly. Ideally, if I had >>> a 16 GB index, I'd want it spread across 4 4GB nodes. Can any client connect >>> to any one node request info and it will get the info back from a node that >>> has that part of the index in memory? >>> >> >>> >> What's the best way to do efficient reads? >>> >> >>> >> Suhail >>> >> >>> >> >>> >>> >> >> >> -- >> http://mixpanel.com >> Blog: http://blog.mixpanel.com >> > > > > -- > http://mixpanel.com > Blog: http://blog.mixpanel.com > -- http://mixpanel.com Blog: http://blog.mixpanel.com