Another piece I am interested in is how cassandra distributes the data automatically. In MySQL you need to shard and you'd pick the shard to request info from--how does that translate in cassandra?
On Thu, Jan 28, 2010 at 7:23 PM, Suhail Doshi <suh...@mixpanel.com> wrote: > We've started to use Cassandra in production and just have one node right > now. Here's one of our ColumnFamilys: > > 16G Jan 28 22:28 SomeIndex-5467-Index.db > 196M Jan 28 22:32 SomeIndex-5487-Index.db > > The first bottle neck you encounter is reads--writes are extremely fast even > with one node. > > My question is, is the size of the *-Index.db files the amount of RAM you > need available for Cassandra to do reads fast? > > What are some configuration options you would need to tweak besides the JVM's > max memory size being larger. Is there any default configurations commonly > missed? > > Next, if you provision more nodes will Cassandra distribute the data in > memory so I don't need a single 16 GB node? Is there anything I need to build > in my application logic to make this work correctly. Ideally, if I had a 16 > GB index, I'd want it spread across 4 4GB nodes. Can any client connect to > any one node request info and it will get the info back from a node that has > that part of the index in memory? > > What's the best way to do efficient reads? > > Suhail > >