Re: How to avoid flush if the data can fit into memtable

2017-06-02 Thread preetika tyagi
Great explanation and the blog post, Akhil. Sorry for the delayed response (somehow didn't notice the email in my inbox), but this is what I concluded as well. In addition to compression, I believe the sstable is serialized as well and the combination of both results into much smaller sstable

Re: How to avoid flush if the data can fit into memtable

2017-06-02 Thread Jeff Jirsa
On 2017-05-24 17:42 (-0700), preetika tyagi wrote: > Hi, > > I'm running Cassandra with a very small dataset so that the data can exist > on memtable only. Below are my configurations: > > In jvm.options: > > -Xms4G > -Xmx4G > > In cassandra.yaml, > >

Re: How to avoid flush if the data can fit into memtable

2017-06-01 Thread Akhil Mehra
Kevin, Stefan thanks for the positive feedback and questions. Stefan in the blog post I am writing generally based on Apache Cassandra defaults. The meltable cleanup threshold is 1/(1+ memtable_flush_writers). By default the meltable_flush_writers defaults to two. This comes to 33 percent of

Re: How to avoid flush if the data can fit into memtable

2017-06-01 Thread Stefan Litsche
Hello Akhil, thanks for your great blog post. One thing I cannot bring together: In the answer mail you write: "Note the cleanup threshold is .50 of 1GB and not a combination of heap and off heap space." In your blog post you write: "memtable_cleanup_threshold is the default value i.e. 33 percent

Re: How to avoid flush if the data can fit into memtable

2017-05-31 Thread Kevin O'Connor
Great post Akhil! Thanks for explaining that. On Mon, May 29, 2017 at 5:43 PM, Akhil Mehra wrote: > Hi Preetika, > > After thinking about your scenario I believe your small SSTable size might > be due to data compression. By default, all tables enable SSTable >

Re: How to avoid flush if the data can fit into memtable

2017-05-29 Thread Akhil Mehra
Hi Preetika, After thinking about your scenario I believe your small SSTable size might be due to data compression. By default, all tables enable SSTable compression. Let go through your scenario. Let's say you have allocated 4GB to your Cassandra node. Your *memtable_heap_space_in_mb* and

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread preetika tyagi
I agree that for such a small data, Cassandra is obviously not needed. However, this is purely an experimental setup by using which I'm trying to understand how and exactly when memtable flush is triggered. As I mentioned in my post, I read the documentation and tweaked the parameters accordingly

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
It doesn't have to fit in memory. If your key distribution has strong temporal locality, then a larger memtable that can coalesce overwrites greatly reduces the disk I/O load for the memtable flush and subsequent compactions. Of course, I have no idea if the is what the OP had in mind. On

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread daemeon reiydelle
This sounds exactly like a previous post that ended when I asked the person to document the number of nodes ec2 instance type and size. I suspected a single nose you system. So the poster reposts? Hmm. “All men dream, but not equally. Those who dream by night in the dusty recesses of their minds

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Jonathan Haddad
Sorry for the confusion. That was for the OP. I wrote it quickly right after waking up. What I'm asking is why does the OP want to keep his data in the memtable exclusively? If the goal is to "make reads fast", then just turn on row caching. If there's so little data that it fits in memory

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
Not sure whether you're asking me or the original poster, but the more times data gets overwritten in a memtable, the less it has to be compacted later on (and even without overwrites, larger memtables result in less compaction). On 05/25/2017 05:59 PM, Jonathan Haddad wrote: Why do you

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Jonathan Haddad
Why do you think keeping your data in the memtable is a what you need to do? On Thu, May 25, 2017 at 7:16 AM Avi Kivity wrote: > Then it doesn't have to (it still may, for other reasons). > > On 05/25/2017 05:11 PM, preetika tyagi wrote: > > What if the commit log is disabled?

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
Then it doesn't have to (it still may, for other reasons). On 05/25/2017 05:11 PM, preetika tyagi wrote: What if the commit log is disabled? On May 25, 2017 4:31 AM, "Avi Kivity" > wrote: Cassandra has to flush the memtable occasionally, or

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread preetika tyagi
What if the commit log is disabled? On May 25, 2017 4:31 AM, "Avi Kivity" wrote: > Cassandra has to flush the memtable occasionally, or the commit log grows > without bounds. > > On 05/25/2017 03:42 AM, preetika tyagi wrote: > > Hi, > > I'm running Cassandra with a very small

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
Cassandra has to flush the memtable occasionally, or the commit log grows without bounds. On 05/25/2017 03:42 AM, preetika tyagi wrote: Hi, I'm running Cassandra with a very small dataset so that the data can exist on memtable only. Below are my configurations: In jvm.options: |-Xms4G