Re: How to avoid flush if the data can fit into memtable

2017-06-02 Thread preetika tyagi
Great explanation and the blog post, Akhil.

Sorry for the delayed response (somehow didn't notice the email in my
inbox), but this is what I concluded as well.

In addition to compression, I believe the sstable is serialized as well and
the combination of both results into much smaller sstable size in
comparison to in-memory memtable size which holds all the data in java
objects.

I also did a small experiment for this. When I allocate 4GB of heap
(resulting into roughly 981MB for memtable as per your post) and then write
approx 920MB of data, it ends up writing some sstables. However, if I
increase the heap size to 120GB and write ~920MB of data again, it doesn't
write anything to the sstable. Therefore, it clearly indicates that I need
bigger heap sizes.

One interesting fact though, if I bring heap size down to 64GB which means
memtable will roughly be around 16GB and again write ~920MB data, it still
writes some sstables. The ratio of 920MB serialized + compressed data and
more than 16GB in-memory memtable data looks a bit weird but I don't have a
solid explanation for this behavior.

However, I'm not going to look into that so we can conclude this post :)

Thank you all for your responses!

Preetika


On Fri, Jun 2, 2017 at 10:56 AM, Jeff Jirsa  wrote:

>
>
> On 2017-05-24 17:42 (-0700), preetika tyagi 
> wrote:
> > Hi,
> >
> > I'm running Cassandra with a very small dataset so that the data can
> exist
> > on memtable only. Below are my configurations:
> >
> > In jvm.options:
> >
> > -Xms4G
> > -Xmx4G
> >
> > In cassandra.yaml,
> >
> > memtable_cleanup_threshold: 0.50
> > memtable_allocation_type: heap_buffers
> >
> > As per the documentation in cassandra.yaml, the
> *memtable_heap_space_in_mb*
> >  and *memtable_heap_space_in_mb* will be set of 1/4 of heap size i.e.
> 1000MB
> >
> > According to the documentation here (
> > http://docs.datastax.com/en/cassandra/3.0/cassandra/
> configuration/configCassandra_yaml.html#configCassandra_
> yaml__memtable_cleanup_threshold),
> > the memtable flush will trigger if the total size of memtabl(s) goes
> beyond
> > (1000+1000)*0.50=1000MB.
>
> 1/4 heap (=1G) * .5 cleanup means cleanup happens at 500MB, or when
> commitlog hits its max size. If you disable durable writes (disable the
> cleanup), you're flushing at 500MB.
>
> Recall that your 300MB of data also has associated data with it
> (timestamps, ttls, etc) that will increase size beyond your nominal
> calculation from the user side.
>
> If you're sure you want to do this, set durable_writes=false and either
> raise the memtable_cleanup_threshold significantly, raise your heap or
> memtable size.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: How to avoid flush if the data can fit into memtable

2017-06-02 Thread Jeff Jirsa


On 2017-05-24 17:42 (-0700), preetika tyagi  wrote: 
> Hi,
> 
> I'm running Cassandra with a very small dataset so that the data can exist
> on memtable only. Below are my configurations:
> 
> In jvm.options:
> 
> -Xms4G
> -Xmx4G
> 
> In cassandra.yaml,
> 
> memtable_cleanup_threshold: 0.50
> memtable_allocation_type: heap_buffers
> 
> As per the documentation in cassandra.yaml, the *memtable_heap_space_in_mb*
>  and *memtable_heap_space_in_mb* will be set of 1/4 of heap size i.e. 1000MB
> 
> According to the documentation here (
> http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
> the memtable flush will trigger if the total size of memtabl(s) goes beyond
> (1000+1000)*0.50=1000MB.

1/4 heap (=1G) * .5 cleanup means cleanup happens at 500MB, or when commitlog 
hits its max size. If you disable durable writes (disable the cleanup), you're 
flushing at 500MB.

Recall that your 300MB of data also has associated data with it (timestamps, 
ttls, etc) that will increase size beyond your nominal calculation from the 
user side.

If you're sure you want to do this, set durable_writes=false and either raise 
the memtable_cleanup_threshold significantly, raise your heap or memtable size. 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: How to avoid flush if the data can fit into memtable

2017-06-01 Thread Akhil Mehra
Kevin, Stefan thanks for the positive feedback and questions.

Stefan in the blog post I am writing generally based on Apache Cassandra 
defaults. The meltable cleanup threshold is 1/(1+ memtable_flush_writers). By 
default the meltable_flush_writers defaults to two. This comes to 33 percent of 
the allocated memory. I have updated the blog post adding in this missing 
detail :)

In the email I was trying to address the OP’s original question. I mentioned .5 
because the OP had set the memtable_cleanup_threshold to .50. This is 50% of 
the allocated memory. I was also mentioning that clean up triggered when either 
on or off heap memory reaches the clean up threshold. Please refer to 
https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/utils/memory/MemtableCleanerThread.java#L46-L49
 

 .

I hope that helps.

Regards,
Akhil



 
> On 2/06/2017, at 2:04 AM, Stefan Litsche  wrote:
> 
> Hello Akhil,
> 
> thanks for your great blog post.
> One thing I cannot bring together:
> In the answer mail you write:
> "Note the cleanup threshold is .50 of 1GB and not a combination of heap and 
> off heap space."
> In your blog post you write:
> "memtable_cleanup_threshold is the default value i.e. 33 percent of the total 
> memtable heap and off heap memory."
> 
> Could you clarify this?
> 
> Thanks
> Stefan
> 
> 
> 2017-05-30 2:43 GMT+02:00 Akhil Mehra :
> Hi Preetika,
> 
> After thinking about your scenario I believe your small SSTable size might be 
> due to data compression. By default, all tables enable SSTable compression. 
> 
> Let go through your scenario. Let's say you have allocated 4GB to your 
> Cassandra node. Your memtable_heap_space_in_mb and 
> memtable_offheap_space_in_mb  will roughly come to around 1GB. Since you have 
> memtable_cleanup_threshold to .50 table cleanup will be triggered when total 
> allocated memtable space exceeds 1/2GB. Note the cleanup threshold is .50 of 
> 1GB and not a combination of heap and off heap space. This memtable 
> allocation size is the total amount available for all tables on your node. 
> This includes all system related keyspaces. The cleanup process will write 
> the largest memtable to disk.
> 
> For your case, I am assuming that you are on a single node with only one 
> table with insert activity. I do not think the commit log will trigger a 
> flush in this circumstance as by default the commit log has 8192 MB of space 
> unless the commit log is placed on a very small disk. 
> 
> I am assuming your table on disk is smaller than 500MB because of 
> compression. You can disable compression on your table and see if this helps 
> get the desired size.
> 
> I have written up a blog post explaining memtable flushing 
> (http://abiasforaction.net/apache-cassandra-memtable-flush/)
> 
> Let me know if you have any other question. 
> 
> I hope this helps.
> 
> Regards,
> Akhil Mehra 
> 
> 
> On Fri, May 26, 2017 at 6:58 AM, preetika tyagi  
> wrote:
> I agree that for such a small data, Cassandra is obviously not needed. 
> However, this is purely an experimental setup by using which I'm trying to 
> understand how and exactly when memtable flush is triggered. As I mentioned 
> in my post, I read the documentation and tweaked the parameters accordingly 
> so that I never hit memtable flush but it is still doing that. As far the the 
> setup is concerned, I'm just using 1 node and running Cassandra using 
> "cassandra -R" option and then running some queries to insert some dummy data.
> 
> I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml 
> and add "durable_writes=false" in the keyspace_definition.
> 
> @Daemeon - The previous post lead to this post but since I was unaware of 
> memtable flush and I assumed memtable flush wasn't happening, the previous 
> post was about something else (throughput/latency etc.). This post is 
> explicitly about exactly when memtable is being dumped to the disk. Didn't 
> want to confuse two different goals that's why posted a new one.
> 
> On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:
> It doesn't have to fit in memory. If your key distribution has strong 
> temporal locality, then a larger memtable that can coalesce overwrites 
> greatly reduces the disk I/O load for the memtable flush and subsequent 
> compactions. Of course, I have no idea if the is what the OP had in mind.
> 
> 
> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>> Sorry for the confusion.  That was for the OP.  I wrote it quickly right 
>> after waking up.
>> 
>> What I'm asking is why does the OP want to keep his data in the memtable 
>> exclusively?  If the goal is to "make reads fast", then just turn on row 
>> caching.  
>> 

Re: How to avoid flush if the data can fit into memtable

2017-06-01 Thread Stefan Litsche
Hello Akhil,

thanks for your great blog post.
One thing I cannot bring together:
In the answer mail you write:
"Note the cleanup threshold is .50 of 1GB and not a combination of heap and
off heap space."
In your blog post you write:
"memtable_cleanup_threshold is the default value i.e. 33 percent of the
total memtable heap and off heap memory."

Could you clarify this?

Thanks
Stefan


2017-05-30 2:43 GMT+02:00 Akhil Mehra :

> Hi Preetika,
>
> After thinking about your scenario I believe your small SSTable size might
> be due to data compression. By default, all tables enable SSTable
> compression.
>
> Let go through your scenario. Let's say you have allocated 4GB to your
> Cassandra node. Your *memtable_heap_space_in_mb* and
>
> *memtable_offheap_space_in_mb  *will roughly come to around 1GB. Since
> you have memtable_cleanup_threshold to .50 table cleanup will be
> triggered when total allocated memtable space exceeds 1/2GB. Note the
> cleanup threshold is .50 of 1GB and not a combination of heap and off heap
> space. This memtable allocation size is the total amount available for all
> tables on your node. This includes all system related keyspaces. The
> cleanup process will write the largest memtable to disk.
>
> For your case, I am assuming that you are on a *single node with only one
> table with insert activity*. I do not think the commit log will trigger a
> flush in this circumstance as by default the commit log has 8192 MB of
> space unless the commit log is placed on a very small disk.
>
> I am assuming your table on disk is smaller than 500MB because of
> compression. You can disable compression on your table and see if this
> helps get the desired size.
>
> I have written up a blog post explaining memtable flushing (
> http://abiasforaction.net/apache-cassandra-memtable-flush/)
>
> Let me know if you have any other question.
>
> I hope this helps.
>
> Regards,
> Akhil Mehra
>
>
> On Fri, May 26, 2017 at 6:58 AM, preetika tyagi 
> wrote:
>
>> I agree that for such a small data, Cassandra is obviously not needed.
>> However, this is purely an experimental setup by using which I'm trying to
>> understand how and exactly when memtable flush is triggered. As I mentioned
>> in my post, I read the documentation and tweaked the parameters accordingly
>> so that I never hit memtable flush but it is still doing that. As far the
>> the setup is concerned, I'm just using 1 node and running Cassandra using
>> "cassandra -R" option and then running some queries to insert some dummy
>> data.
>>
>> I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml
>> and add "durable_writes=false" in the keyspace_definition.
>>
>> @Daemeon - The previous post lead to this post but since I was unaware of
>> memtable flush and I assumed memtable flush wasn't happening, the previous
>> post was about something else (throughput/latency etc.). This post is
>> explicitly about exactly when memtable is being dumped to the disk. Didn't
>> want to confuse two different goals that's why posted a new one.
>>
>> On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:
>>
>>> It doesn't have to fit in memory. If your key distribution has strong
>>> temporal locality, then a larger memtable that can coalesce overwrites
>>> greatly reduces the disk I/O load for the memtable flush and subsequent
>>> compactions. Of course, I have no idea if the is what the OP had in mind.
>>>
>>>
>>> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>>>
>>> Sorry for the confusion.  That was for the OP.  I wrote it quickly right
>>> after waking up.
>>>
>>> What I'm asking is why does the OP want to keep his data in the memtable
>>> exclusively?  If the goal is to "make reads fast", then just turn on row
>>> caching.
>>>
>>> If there's so little data that it fits in memory (300MB), and there
>>> aren't going to be any writes past the initial small dataset, why use
>>> Cassandra?  It sounds like the wrong tool for this job.  Sounds like
>>> something that could easily be stored in S3 and loaded in memory when the
>>> app is fired up.
>>>
>>> On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:
>>>
 Not sure whether you're asking me or the original poster, but the more
 times data gets overwritten in a memtable, the less it has to be compacted
 later on (and even without overwrites, larger memtables result in less
 compaction).

 On 05/25/2017 05:59 PM, Jonathan Haddad wrote:

 Why do you think keeping your data in the memtable is a what you need
 to do?
 On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:

> Then it doesn't have to (it still may, for other reasons).
>
> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>
> What if the commit log is disabled?
>
> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>
>> Cassandra has to flush the 

Re: How to avoid flush if the data can fit into memtable

2017-05-31 Thread Kevin O'Connor
Great post Akhil! Thanks for explaining that.

On Mon, May 29, 2017 at 5:43 PM, Akhil Mehra  wrote:

> Hi Preetika,
>
> After thinking about your scenario I believe your small SSTable size might
> be due to data compression. By default, all tables enable SSTable
> compression.
>
> Let go through your scenario. Let's say you have allocated 4GB to your
> Cassandra node. Your *memtable_heap_space_in_mb* and
>
> *memtable_offheap_space_in_mb  *will roughly come to around 1GB. Since
> you have memtable_cleanup_threshold to .50 table cleanup will be
> triggered when total allocated memtable space exceeds 1/2GB. Note the
> cleanup threshold is .50 of 1GB and not a combination of heap and off heap
> space. This memtable allocation size is the total amount available for all
> tables on your node. This includes all system related keyspaces. The
> cleanup process will write the largest memtable to disk.
>
> For your case, I am assuming that you are on a *single node with only one
> table with insert activity*. I do not think the commit log will trigger a
> flush in this circumstance as by default the commit log has 8192 MB of
> space unless the commit log is placed on a very small disk.
>
> I am assuming your table on disk is smaller than 500MB because of
> compression. You can disable compression on your table and see if this
> helps get the desired size.
>
> I have written up a blog post explaining memtable flushing (
> http://abiasforaction.net/apache-cassandra-memtable-flush/)
>
> Let me know if you have any other question.
>
> I hope this helps.
>
> Regards,
> Akhil Mehra
>
>
> On Fri, May 26, 2017 at 6:58 AM, preetika tyagi 
> wrote:
>
>> I agree that for such a small data, Cassandra is obviously not needed.
>> However, this is purely an experimental setup by using which I'm trying to
>> understand how and exactly when memtable flush is triggered. As I mentioned
>> in my post, I read the documentation and tweaked the parameters accordingly
>> so that I never hit memtable flush but it is still doing that. As far the
>> the setup is concerned, I'm just using 1 node and running Cassandra using
>> "cassandra -R" option and then running some queries to insert some dummy
>> data.
>>
>> I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml
>> and add "durable_writes=false" in the keyspace_definition.
>>
>> @Daemeon - The previous post lead to this post but since I was unaware of
>> memtable flush and I assumed memtable flush wasn't happening, the previous
>> post was about something else (throughput/latency etc.). This post is
>> explicitly about exactly when memtable is being dumped to the disk. Didn't
>> want to confuse two different goals that's why posted a new one.
>>
>> On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:
>>
>>> It doesn't have to fit in memory. If your key distribution has strong
>>> temporal locality, then a larger memtable that can coalesce overwrites
>>> greatly reduces the disk I/O load for the memtable flush and subsequent
>>> compactions. Of course, I have no idea if the is what the OP had in mind.
>>>
>>>
>>> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>>>
>>> Sorry for the confusion.  That was for the OP.  I wrote it quickly right
>>> after waking up.
>>>
>>> What I'm asking is why does the OP want to keep his data in the memtable
>>> exclusively?  If the goal is to "make reads fast", then just turn on row
>>> caching.
>>>
>>> If there's so little data that it fits in memory (300MB), and there
>>> aren't going to be any writes past the initial small dataset, why use
>>> Cassandra?  It sounds like the wrong tool for this job.  Sounds like
>>> something that could easily be stored in S3 and loaded in memory when the
>>> app is fired up.
>>>
>>> On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:
>>>
 Not sure whether you're asking me or the original poster, but the more
 times data gets overwritten in a memtable, the less it has to be compacted
 later on (and even without overwrites, larger memtables result in less
 compaction).

 On 05/25/2017 05:59 PM, Jonathan Haddad wrote:

 Why do you think keeping your data in the memtable is a what you need
 to do?
 On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:

> Then it doesn't have to (it still may, for other reasons).
>
> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>
> What if the commit log is disabled?
>
> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>
>> Cassandra has to flush the memtable occasionally, or the commit log
>> grows without bounds.
>>
>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>
>> Hi,
>>
>> I'm running Cassandra with a very small dataset so that the data can
>> exist on memtable only. Below are my configurations:
>>
>> In jvm.options:
>>
>> 

Re: How to avoid flush if the data can fit into memtable

2017-05-29 Thread Akhil Mehra
Hi Preetika,

After thinking about your scenario I believe your small SSTable size might
be due to data compression. By default, all tables enable SSTable
compression.

Let go through your scenario. Let's say you have allocated 4GB to your
Cassandra node. Your *memtable_heap_space_in_mb* and

*memtable_offheap_space_in_mb  *will roughly come to around 1GB. Since you
have memtable_cleanup_threshold to .50 table cleanup will be triggered when
total allocated memtable space exceeds 1/2GB. Note the cleanup threshold is
.50 of 1GB and not a combination of heap and off heap space. This memtable
allocation size is the total amount available for all tables on your node.
This includes all system related keyspaces. The cleanup process will write
the largest memtable to disk.

For your case, I am assuming that you are on a *single node with only one
table with insert activity*. I do not think the commit log will trigger a
flush in this circumstance as by default the commit log has 8192 MB of
space unless the commit log is placed on a very small disk.

I am assuming your table on disk is smaller than 500MB because of
compression. You can disable compression on your table and see if this
helps get the desired size.

I have written up a blog post explaining memtable flushing (
http://abiasforaction.net/apache-cassandra-memtable-flush/)

Let me know if you have any other question.

I hope this helps.

Regards,
Akhil Mehra


On Fri, May 26, 2017 at 6:58 AM, preetika tyagi 
wrote:

> I agree that for such a small data, Cassandra is obviously not needed.
> However, this is purely an experimental setup by using which I'm trying to
> understand how and exactly when memtable flush is triggered. As I mentioned
> in my post, I read the documentation and tweaked the parameters accordingly
> so that I never hit memtable flush but it is still doing that. As far the
> the setup is concerned, I'm just using 1 node and running Cassandra using
> "cassandra -R" option and then running some queries to insert some dummy
> data.
>
> I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml
> and add "durable_writes=false" in the keyspace_definition.
>
> @Daemeon - The previous post lead to this post but since I was unaware of
> memtable flush and I assumed memtable flush wasn't happening, the previous
> post was about something else (throughput/latency etc.). This post is
> explicitly about exactly when memtable is being dumped to the disk. Didn't
> want to confuse two different goals that's why posted a new one.
>
> On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:
>
>> It doesn't have to fit in memory. If your key distribution has strong
>> temporal locality, then a larger memtable that can coalesce overwrites
>> greatly reduces the disk I/O load for the memtable flush and subsequent
>> compactions. Of course, I have no idea if the is what the OP had in mind.
>>
>>
>> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>>
>> Sorry for the confusion.  That was for the OP.  I wrote it quickly right
>> after waking up.
>>
>> What I'm asking is why does the OP want to keep his data in the memtable
>> exclusively?  If the goal is to "make reads fast", then just turn on row
>> caching.
>>
>> If there's so little data that it fits in memory (300MB), and there
>> aren't going to be any writes past the initial small dataset, why use
>> Cassandra?  It sounds like the wrong tool for this job.  Sounds like
>> something that could easily be stored in S3 and loaded in memory when the
>> app is fired up.
>>
>> On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:
>>
>>> Not sure whether you're asking me or the original poster, but the more
>>> times data gets overwritten in a memtable, the less it has to be compacted
>>> later on (and even without overwrites, larger memtables result in less
>>> compaction).
>>>
>>> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>>>
>>> Why do you think keeping your data in the memtable is a what you need to
>>> do?
>>> On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:
>>>
 Then it doesn't have to (it still may, for other reasons).

 On 05/25/2017 05:11 PM, preetika tyagi wrote:

 What if the commit log is disabled?

 On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:

> Cassandra has to flush the memtable occasionally, or the commit log
> grows without bounds.
>
> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>
> Hi,
>
> I'm running Cassandra with a very small dataset so that the data can
> exist on memtable only. Below are my configurations:
>
> In jvm.options:
>
> -Xms4G
> -Xmx4G
>
> In cassandra.yaml,
>
> memtable_cleanup_threshold: 0.50
> memtable_allocation_type: heap_buffers
>
> As per the documentation in cassandra.yaml, the
> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* 

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread preetika tyagi
I agree that for such a small data, Cassandra is obviously not needed.
However, this is purely an experimental setup by using which I'm trying to
understand how and exactly when memtable flush is triggered. As I mentioned
in my post, I read the documentation and tweaked the parameters accordingly
so that I never hit memtable flush but it is still doing that. As far the
the setup is concerned, I'm just using 1 node and running Cassandra using
"cassandra -R" option and then running some queries to insert some dummy
data.

I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml
and add "durable_writes=false" in the keyspace_definition.

@Daemeon - The previous post lead to this post but since I was unaware of
memtable flush and I assumed memtable flush wasn't happening, the previous
post was about something else (throughput/latency etc.). This post is
explicitly about exactly when memtable is being dumped to the disk. Didn't
want to confuse two different goals that's why posted a new one.

On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:

> It doesn't have to fit in memory. If your key distribution has strong
> temporal locality, then a larger memtable that can coalesce overwrites
> greatly reduces the disk I/O load for the memtable flush and subsequent
> compactions. Of course, I have no idea if the is what the OP had in mind.
>
>
> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>
> Sorry for the confusion.  That was for the OP.  I wrote it quickly right
> after waking up.
>
> What I'm asking is why does the OP want to keep his data in the memtable
> exclusively?  If the goal is to "make reads fast", then just turn on row
> caching.
>
> If there's so little data that it fits in memory (300MB), and there aren't
> going to be any writes past the initial small dataset, why use Cassandra?
> It sounds like the wrong tool for this job.  Sounds like something that
> could easily be stored in S3 and loaded in memory when the app is fired up.
>
>
> On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:
>
>> Not sure whether you're asking me or the original poster, but the more
>> times data gets overwritten in a memtable, the less it has to be compacted
>> later on (and even without overwrites, larger memtables result in less
>> compaction).
>>
>> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>>
>> Why do you think keeping your data in the memtable is a what you need to
>> do?
>> On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:
>>
>>> Then it doesn't have to (it still may, for other reasons).
>>>
>>> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>>>
>>> What if the commit log is disabled?
>>>
>>> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>>>
 Cassandra has to flush the memtable occasionally, or the commit log
 grows without bounds.

 On 05/25/2017 03:42 AM, preetika tyagi wrote:

 Hi,

 I'm running Cassandra with a very small dataset so that the data can
 exist on memtable only. Below are my configurations:

 In jvm.options:

 -Xms4G
 -Xmx4G

 In cassandra.yaml,

 memtable_cleanup_threshold: 0.50
 memtable_allocation_type: heap_buffers

 As per the documentation in cassandra.yaml, the
 *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be
 set of 1/4 of heap size i.e. 1000MB

 According to the documentation here (http://docs.datastax.com/en/
 cassandra/3.0/cassandra/configuration/configCassandra_
 yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the
 memtable flush will trigger if the total size of memtabl(s) goes beyond
 (1000+1000)*0.50=1000MB.

 Now if I perform several write requests which results in almost ~300MB
 of the data, memtable still gets flushed since I see sstables being created
 on file system (Data.db etc.) and I don't understand why.

 Could anyone explain this behavior and point out if I'm missing
 something here?

 Thanks,

 Preetika



>>>
>>
>


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
It doesn't have to fit in memory. If your key distribution has strong 
temporal locality, then a larger memtable that can coalesce overwrites 
greatly reduces the disk I/O load for the memtable flush and subsequent 
compactions. Of course, I have no idea if the is what the OP had in mind.


On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
Sorry for the confusion.  That was for the OP.  I wrote it quickly 
right after waking up.


What I'm asking is why does the OP want to keep his data in the 
memtable exclusively?  If the goal is to "make reads fast", then just 
turn on row caching.


If there's so little data that it fits in memory (300MB), and there 
aren't going to be any writes past the initial small dataset, why use 
Cassandra?  It sounds like the wrong tool for this job.  Sounds like 
something that could easily be stored in S3 and loaded in memory when 
the app is fired up.


On Thu, May 25, 2017 at 8:06 AM Avi Kivity > wrote:


Not sure whether you're asking me or the original poster, but the
more times data gets overwritten in a memtable, the less it has to
be compacted later on (and even without overwrites, larger
memtables result in less compaction).


On 05/25/2017 05:59 PM, Jonathan Haddad wrote:

Why do you think keeping your data in the memtable is a what you
need to do?
On Thu, May 25, 2017 at 7:16 AM Avi Kivity > wrote:

Then it doesn't have to (it still may, for other reasons).


On 05/25/2017 05:11 PM, preetika tyagi wrote:

What if the commit log is disabled?

On May 25, 2017 4:31 AM, "Avi Kivity" > wrote:

Cassandra has to flush the memtable occasionally, or the
commit log grows without bounds.


On 05/25/2017 03:42 AM, preetika tyagi wrote:

Hi,

I'm running Cassandra with a very small dataset so that
the data can exist on memtable only. Below are my
configurations:

In jvm.options:

|-Xms4G -Xmx4G |

In cassandra.yaml,

|memtable_cleanup_threshold: 0.50
memtable_allocation_type: heap_buffers |

As per the documentation in cassandra.yaml, the
/memtable_heap_space_in_mb/ and
/memtable_heap_space_in_mb/ will be set of 1/4 of heap
size i.e. 1000MB

According to the documentation here

(http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
the memtable flush will trigger if the total size of
memtabl(s) goes beyond (1000+1000)*0.50=1000MB.

Now if I perform several write requests which results
in almost ~300MB of the data, memtable still gets
flushed since I see sstables being created on file
system (Data.db etc.) and I don't understand why.

Could anyone explain this behavior and point out if I'm
missing something here?

Thanks,

Preetika











Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread daemeon reiydelle
This sounds exactly like a previous post that ended when I asked the person
to document the number of nodes ec2 instance type and size. I suspected a
single nose you system. So the poster reposts? Hmm.

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 25, 2017 9:14 AM, "Jonathan Haddad"  wrote:

Sorry for the confusion.  That was for the OP.  I wrote it quickly right
after waking up.

What I'm asking is why does the OP want to keep his data in the memtable
exclusively?  If the goal is to "make reads fast", then just turn on row
caching.

If there's so little data that it fits in memory (300MB), and there aren't
going to be any writes past the initial small dataset, why use Cassandra?
It sounds like the wrong tool for this job.  Sounds like something that
could easily be stored in S3 and loaded in memory when the app is fired up.


On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:

> Not sure whether you're asking me or the original poster, but the more
> times data gets overwritten in a memtable, the less it has to be compacted
> later on (and even without overwrites, larger memtables result in less
> compaction).
>
> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>
> Why do you think keeping your data in the memtable is a what you need to
> do?
> On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:
>
>> Then it doesn't have to (it still may, for other reasons).
>>
>> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>>
>> What if the commit log is disabled?
>>
>> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>>
>>> Cassandra has to flush the memtable occasionally, or the commit log
>>> grows without bounds.
>>>
>>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>>
>>> Hi,
>>>
>>> I'm running Cassandra with a very small dataset so that the data can
>>> exist on memtable only. Below are my configurations:
>>>
>>> In jvm.options:
>>>
>>> -Xms4G
>>> -Xmx4G
>>>
>>> In cassandra.yaml,
>>>
>>> memtable_cleanup_threshold: 0.50
>>> memtable_allocation_type: heap_buffers
>>>
>>> As per the documentation in cassandra.yaml, the
>>> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
>>> of 1/4 of heap size i.e. 1000MB
>>>
>>> According to the documentation here (http://docs.datastax.com/en/
>>> cassandra/3.0/cassandra/configuration/configCassandra_
>>> yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the
>>> memtable flush will trigger if the total size of memtabl(s) goes beyond
>>> (1000+1000)*0.50=1000MB.
>>>
>>> Now if I perform several write requests which results in almost ~300MB
>>> of the data, memtable still gets flushed since I see sstables being created
>>> on file system (Data.db etc.) and I don't understand why.
>>>
>>> Could anyone explain this behavior and point out if I'm missing
>>> something here?
>>>
>>> Thanks,
>>>
>>> Preetika
>>>
>>>
>>>
>>
>


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Jonathan Haddad
Sorry for the confusion.  That was for the OP.  I wrote it quickly right
after waking up.

What I'm asking is why does the OP want to keep his data in the memtable
exclusively?  If the goal is to "make reads fast", then just turn on row
caching.

If there's so little data that it fits in memory (300MB), and there aren't
going to be any writes past the initial small dataset, why use Cassandra?
It sounds like the wrong tool for this job.  Sounds like something that
could easily be stored in S3 and loaded in memory when the app is fired up.


On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:

> Not sure whether you're asking me or the original poster, but the more
> times data gets overwritten in a memtable, the less it has to be compacted
> later on (and even without overwrites, larger memtables result in less
> compaction).
>
> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>
> Why do you think keeping your data in the memtable is a what you need to
> do?
> On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:
>
>> Then it doesn't have to (it still may, for other reasons).
>>
>> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>>
>> What if the commit log is disabled?
>>
>> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>>
>>> Cassandra has to flush the memtable occasionally, or the commit log
>>> grows without bounds.
>>>
>>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>>
>>> Hi,
>>>
>>> I'm running Cassandra with a very small dataset so that the data can
>>> exist on memtable only. Below are my configurations:
>>>
>>> In jvm.options:
>>>
>>> -Xms4G
>>> -Xmx4G
>>>
>>> In cassandra.yaml,
>>>
>>> memtable_cleanup_threshold: 0.50
>>> memtable_allocation_type: heap_buffers
>>>
>>> As per the documentation in cassandra.yaml, the
>>> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
>>> of 1/4 of heap size i.e. 1000MB
>>>
>>> According to the documentation here (
>>> http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
>>> the memtable flush will trigger if the total size of memtabl(s) goes beyond
>>> (1000+1000)*0.50=1000MB.
>>>
>>> Now if I perform several write requests which results in almost ~300MB
>>> of the data, memtable still gets flushed since I see sstables being created
>>> on file system (Data.db etc.) and I don't understand why.
>>>
>>> Could anyone explain this behavior and point out if I'm missing
>>> something here?
>>>
>>> Thanks,
>>>
>>> Preetika
>>>
>>>
>>>
>>
>


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
Not sure whether you're asking me or the original poster, but the more 
times data gets overwritten in a memtable, the less it has to be 
compacted later on (and even without overwrites, larger memtables result 
in less compaction).



On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
Why do you think keeping your data in the memtable is a what you need 
to do?
On Thu, May 25, 2017 at 7:16 AM Avi Kivity > wrote:


Then it doesn't have to (it still may, for other reasons).


On 05/25/2017 05:11 PM, preetika tyagi wrote:

What if the commit log is disabled?

On May 25, 2017 4:31 AM, "Avi Kivity" > wrote:

Cassandra has to flush the memtable occasionally, or the
commit log grows without bounds.


On 05/25/2017 03:42 AM, preetika tyagi wrote:

Hi,

I'm running Cassandra with a very small dataset so that the
data can exist on memtable only. Below are my configurations:

In jvm.options:

|-Xms4G -Xmx4G |

In cassandra.yaml,

|memtable_cleanup_threshold: 0.50 memtable_allocation_type:
heap_buffers |

As per the documentation in cassandra.yaml, the
/memtable_heap_space_in_mb/ and
/memtable_heap_space_in_mb/ will be set of 1/4 of heap size
i.e. 1000MB

According to the documentation here

(http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
the memtable flush will trigger if the total size of
memtabl(s) goes beyond (1000+1000)*0.50=1000MB.

Now if I perform several write requests which results in
almost ~300MB of the data, memtable still gets flushed since
I see sstables being created on file system (Data.db etc.)
and I don't understand why.

Could anyone explain this behavior and point out if I'm
missing something here?

Thanks,

Preetika









Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Jonathan Haddad
Why do you think keeping your data in the memtable is a what you need to do?
On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:

> Then it doesn't have to (it still may, for other reasons).
>
> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>
> What if the commit log is disabled?
>
> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>
>> Cassandra has to flush the memtable occasionally, or the commit log grows
>> without bounds.
>>
>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>
>> Hi,
>>
>> I'm running Cassandra with a very small dataset so that the data can
>> exist on memtable only. Below are my configurations:
>>
>> In jvm.options:
>>
>> -Xms4G
>> -Xmx4G
>>
>> In cassandra.yaml,
>>
>> memtable_cleanup_threshold: 0.50
>> memtable_allocation_type: heap_buffers
>>
>> As per the documentation in cassandra.yaml, the
>> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
>> of 1/4 of heap size i.e. 1000MB
>>
>> According to the documentation here (
>> http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
>> the memtable flush will trigger if the total size of memtabl(s) goes beyond
>> (1000+1000)*0.50=1000MB.
>>
>> Now if I perform several write requests which results in almost ~300MB of
>> the data, memtable still gets flushed since I see sstables being created on
>> file system (Data.db etc.) and I don't understand why.
>>
>> Could anyone explain this behavior and point out if I'm missing something
>> here?
>>
>> Thanks,
>>
>> Preetika
>>
>>
>>
>


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity

Then it doesn't have to (it still may, for other reasons).


On 05/25/2017 05:11 PM, preetika tyagi wrote:

What if the commit log is disabled?

On May 25, 2017 4:31 AM, "Avi Kivity" > wrote:


Cassandra has to flush the memtable occasionally, or the commit
log grows without bounds.


On 05/25/2017 03:42 AM, preetika tyagi wrote:

Hi,

I'm running Cassandra with a very small dataset so that the data
can exist on memtable only. Below are my configurations:

In jvm.options:

|-Xms4G -Xmx4G |

In cassandra.yaml,

|memtable_cleanup_threshold: 0.50 memtable_allocation_type:
heap_buffers |

As per the documentation in cassandra.yaml, the
/memtable_heap_space_in_mb/ and /memtable_heap_space_in_mb/ will
be set of 1/4 of heap size i.e. 1000MB

According to the documentation here

(http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold

),
the memtable flush will trigger if the total size of memtabl(s)
goes beyond (1000+1000)*0.50=1000MB.

Now if I perform several write requests which results in almost
~300MB of the data, memtable still gets flushed since I see
sstables being created on file system (Data.db etc.) and I don't
understand why.

Could anyone explain this behavior and point out if I'm missing
something here?

Thanks,

Preetika







Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread preetika tyagi
What if the commit log is disabled?

On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:

> Cassandra has to flush the memtable occasionally, or the commit log grows
> without bounds.
>
> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>
> Hi,
>
> I'm running Cassandra with a very small dataset so that the data can
> exist on memtable only. Below are my configurations:
>
> In jvm.options:
>
> -Xms4G
> -Xmx4G
>
> In cassandra.yaml,
>
> memtable_cleanup_threshold: 0.50
> memtable_allocation_type: heap_buffers
>
> As per the documentation in cassandra.yaml, the
> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
> of 1/4 of heap size i.e. 1000MB
>
> According to the documentation here (http://docs.datastax.com/en/
> cassandra/3.0/cassandra/configuration/configCassandra_
> yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the memtable
> flush will trigger if the total size of memtabl(s) goes beyond
> (1000+1000)*0.50=1000MB.
>
> Now if I perform several write requests which results in almost ~300MB of
> the data, memtable still gets flushed since I see sstables being created on
> file system (Data.db etc.) and I don't understand why.
>
> Could anyone explain this behavior and point out if I'm missing something
> here?
>
> Thanks,
>
> Preetika
>
>
>


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
Cassandra has to flush the memtable occasionally, or the commit log 
grows without bounds.



On 05/25/2017 03:42 AM, preetika tyagi wrote:

Hi,

I'm running Cassandra with a very small dataset so that the data can 
exist on memtable only. Below are my configurations:


In jvm.options:

|-Xms4G -Xmx4G |

In cassandra.yaml,

|memtable_cleanup_threshold: 0.50 memtable_allocation_type: heap_buffers |

As per the documentation in cassandra.yaml, the 
/memtable_heap_space_in_mb/ and /memtable_heap_space_in_mb/ will be 
set of 1/4 of heap size i.e. 1000MB


According to the documentation here 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold), 
the memtable flush will trigger if the total size of memtabl(s) goes 
beyond (1000+1000)*0.50=1000MB.


Now if I perform several write requests which results in almost ~300MB 
of the data, memtable still gets flushed since I see sstables being 
created on file system (Data.db etc.) and I don't understand why.


Could anyone explain this behavior and point out if I'm missing 
something here?


Thanks,

Preetika