Re: How to avoid flush if the data can fit into memtable

preetika tyagi Thu, 25 May 2017 11:59:15 -0700

I agree that for such a small data, Cassandra is obviously not needed.
However, this is purely an experimental setup by using which I'm trying to
understand how and exactly when memtable flush is triggered. As I mentioned
in my post, I read the documentation and tweaked the parameters accordingly
so that I never hit memtable flush but it is still doing that. As far the
the setup is concerned, I'm just using 1 node and running Cassandra using
"cassandra -R" option and then running some queries to insert some dummy
data.


I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml
and add "durable_writes=false" in the keyspace_definition.

@Daemeon - The previous post lead to this post but since I was unaware of
memtable flush and I assumed memtable flush wasn't happening, the previous
post was about something else (throughput/latency etc.). This post is
explicitly about exactly when memtable is being dumped to the disk. Didn't
want to confuse two different goals that's why posted a new one.

On Thu, May 25, 2017 at 10:38 AM, Avi Kivity <a...@scylladb.com> wrote:

> It doesn't have to fit in memory. If your key distribution has strong
> temporal locality, then a larger memtable that can coalesce overwrites
> greatly reduces the disk I/O load for the memtable flush and subsequent
> compactions. Of course, I have no idea if the is what the OP had in mind.
>
>
> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>
> Sorry for the confusion.  That was for the OP.  I wrote it quickly right
> after waking up.
>
> What I'm asking is why does the OP want to keep his data in the memtable
> exclusively?  If the goal is to "make reads fast", then just turn on row
> caching.
>
> If there's so little data that it fits in memory (300MB), and there aren't
> going to be any writes past the initial small dataset, why use Cassandra?
> It sounds like the wrong tool for this job.  Sounds like something that
> could easily be stored in S3 and loaded in memory when the app is fired up.
>
>
> On Thu, May 25, 2017 at 8:06 AM Avi Kivity <a...@scylladb.com> wrote:
>
>> Not sure whether you're asking me or the original poster, but the more
>> times data gets overwritten in a memtable, the less it has to be compacted
>> later on (and even without overwrites, larger memtables result in less
>> compaction).
>>
>> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>>
>> Why do you think keeping your data in the memtable is a what you need to
>> do?
>> On Thu, May 25, 2017 at 7:16 AM Avi Kivity <a...@scylladb.com> wrote:
>>
>>> Then it doesn't have to (it still may, for other reasons).
>>>
>>> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>>>
>>> What if the commit log is disabled?
>>>
>>> On May 25, 2017 4:31 AM, "Avi Kivity" <a...@scylladb.com> wrote:
>>>
>>>> Cassandra has to flush the memtable occasionally, or the commit log
>>>> grows without bounds.
>>>>
>>>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm running Cassandra with a very small dataset so that the data can
>>>> exist on memtable only. Below are my configurations:
>>>>
>>>> In jvm.options:
>>>>
>>>> -Xms4G
>>>> -Xmx4G
>>>>
>>>> In cassandra.yaml,
>>>>
>>>> memtable_cleanup_threshold: 0.50
>>>> memtable_allocation_type: heap_buffers
>>>>
>>>> As per the documentation in cassandra.yaml, the
>>>> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be
>>>> set of 1/4 of heap size i.e. 1000MB
>>>>
>>>> According to the documentation here (http://docs.datastax.com/en/
>>>> cassandra/3.0/cassandra/configuration/configCassandra_
>>>> yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the
>>>> memtable flush will trigger if the total size of memtabl(s) goes beyond
>>>> (1000+1000)*0.50=1000MB.
>>>>
>>>> Now if I perform several write requests which results in almost ~300MB
>>>> of the data, memtable still gets flushed since I see sstables being created
>>>> on file system (Data.db etc.) and I don't understand why.
>>>>
>>>> Could anyone explain this behavior and point out if I'm missing
>>>> something here?
>>>>
>>>> Thanks,
>>>>
>>>> Preetika
>>>>
>>>>
>>>>
>>>
>>
>

Re: How to avoid flush if the data can fit into memtable

Reply via email to