Kevin, Stefan thanks for the positive feedback and questions. Stefan in the blog post I am writing generally based on Apache Cassandra defaults. The meltable cleanup threshold is 1/(1+ memtable_flush_writers). By default the meltable_flush_writers defaults to two. This comes to 33 percent of the allocated memory. I have updated the blog post adding in this missing detail :)
In the email I was trying to address the OP’s original question. I mentioned .5 because the OP had set the memtable_cleanup_threshold to .50. This is 50% of the allocated memory. I was also mentioning that clean up triggered when either on or off heap memory reaches the clean up threshold. Please refer to https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/utils/memory/MemtableCleanerThread.java#L46-L49 <https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/utils/memory/MemtableCleanerThread.java#L46-L49> . I hope that helps. Regards, Akhil > On 2/06/2017, at 2:04 AM, Stefan Litsche <stefan.lits...@zalando.de> wrote: > > Hello Akhil, > > thanks for your great blog post. > One thing I cannot bring together: > In the answer mail you write: > "Note the cleanup threshold is .50 of 1GB and not a combination of heap and > off heap space." > In your blog post you write: > "memtable_cleanup_threshold is the default value i.e. 33 percent of the total > memtable heap and off heap memory." > > Could you clarify this? > > Thanks > Stefan > > > 2017-05-30 2:43 GMT+02:00 Akhil Mehra <akhilme...@gmail.com>: > Hi Preetika, > > After thinking about your scenario I believe your small SSTable size might be > due to data compression. By default, all tables enable SSTable compression. > > Let go through your scenario. Let's say you have allocated 4GB to your > Cassandra node. Your memtable_heap_space_in_mb and > memtable_offheap_space_in_mb will roughly come to around 1GB. Since you have > memtable_cleanup_threshold to .50 table cleanup will be triggered when total > allocated memtable space exceeds 1/2GB. Note the cleanup threshold is .50 of > 1GB and not a combination of heap and off heap space. This memtable > allocation size is the total amount available for all tables on your node. > This includes all system related keyspaces. The cleanup process will write > the largest memtable to disk. > > For your case, I am assuming that you are on a single node with only one > table with insert activity. I do not think the commit log will trigger a > flush in this circumstance as by default the commit log has 8192 MB of space > unless the commit log is placed on a very small disk. > > I am assuming your table on disk is smaller than 500MB because of > compression. You can disable compression on your table and see if this helps > get the desired size. > > I have written up a blog post explaining memtable flushing > (http://abiasforaction.net/apache-cassandra-memtable-flush/) > > Let me know if you have any other question. > > I hope this helps. > > Regards, > Akhil Mehra > > > On Fri, May 26, 2017 at 6:58 AM, preetika tyagi <preetikaty...@gmail.com> > wrote: > I agree that for such a small data, Cassandra is obviously not needed. > However, this is purely an experimental setup by using which I'm trying to > understand how and exactly when memtable flush is triggered. As I mentioned > in my post, I read the documentation and tweaked the parameters accordingly > so that I never hit memtable flush but it is still doing that. As far the the > setup is concerned, I'm just using 1 node and running Cassandra using > "cassandra -R" option and then running some queries to insert some dummy data. > > I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml > and add "durable_writes=false" in the keyspace_definition. > > @Daemeon - The previous post lead to this post but since I was unaware of > memtable flush and I assumed memtable flush wasn't happening, the previous > post was about something else (throughput/latency etc.). This post is > explicitly about exactly when memtable is being dumped to the disk. Didn't > want to confuse two different goals that's why posted a new one. > > On Thu, May 25, 2017 at 10:38 AM, Avi Kivity <a...@scylladb.com> wrote: > It doesn't have to fit in memory. If your key distribution has strong > temporal locality, then a larger memtable that can coalesce overwrites > greatly reduces the disk I/O load for the memtable flush and subsequent > compactions. Of course, I have no idea if the is what the OP had in mind. > > > On 05/25/2017 07:14 PM, Jonathan Haddad wrote: >> Sorry for the confusion. That was for the OP. I wrote it quickly right >> after waking up. >> >> What I'm asking is why does the OP want to keep his data in the memtable >> exclusively? If the goal is to "make reads fast", then just turn on row >> caching. >> >> If there's so little data that it fits in memory (300MB), and there aren't >> going to be any writes past the initial small dataset, why use Cassandra? >> It sounds like the wrong tool for this job. Sounds like something that >> could easily be stored in S3 and loaded in memory when the app is fired up. >> >> On Thu, May 25, 2017 at 8:06 AM Avi Kivity <a...@scylladb.com> wrote: >> Not sure whether you're asking me or the original poster, but the more times >> data gets overwritten in a memtable, the less it has to be compacted later >> on (and even without overwrites, larger memtables result in less compaction). >> >> On 05/25/2017 05:59 PM, Jonathan Haddad wrote: >>> Why do you think keeping your data in the memtable is a what you need to do? >>> On Thu, May 25, 2017 at 7:16 AM Avi Kivity <a...@scylladb.com> wrote: >>> Then it doesn't have to (it still may, for other reasons). >>> >>> On 05/25/2017 05:11 PM, preetika tyagi wrote: >>>> What if the commit log is disabled? >>>> >>>> On May 25, 2017 4:31 AM, "Avi Kivity" <a...@scylladb.com> wrote: >>>> Cassandra has to flush the memtable occasionally, or the commit log grows >>>> without bounds. >>>> >>>> On 05/25/2017 03:42 AM, preetika tyagi wrote: >>>>> Hi, >>>>> >>>>> I'm running Cassandra with a very small dataset so that the data can >>>>> exist on memtable only. Below are my configurations: >>>>> In jvm.options: >>>>> >>>>> -Xms4G >>>>> -Xmx4G >>>>> >>>>> In cassandra.yaml, >>>>> >>>>> memtable_cleanup_threshold: 0.50 >>>>> memtable_allocation_type: heap_buffers >>>>> >>>>> As per the documentation in cassandra.yaml, the memtable_heap_space_in_mb >>>>> and memtable_heap_space_in_mb will be set of 1/4 of heap size i.e. 1000MB >>>>> >>>>> According to the documentation here >>>>> (http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold), >>>>> the memtable flush will trigger if the total size of memtabl(s) goes >>>>> beyond (1000+1000)*0.50=1000MB. >>>>> >>>>> Now if I perform several write requests which results in almost ~300MB of >>>>> the data, memtable still gets flushed since I see sstables being created >>>>> on file system (Data.db etc.) and I don't understand why. >>>>> >>>>> Could anyone explain this behavior and point out if I'm missing something >>>>> here? >>>>> >>>>> Thanks, >>>>> >>>>> Preetika >>>>> >>>> >>> >> > > > > > > > -- > Stefan Litsche | Mobile: +49 176 12759436 E-Mail: stefan.lits...@zalando.de > >