Re: memtable mem usage off by 10?

2014-06-11 Thread Idrén , Johan
Sorry for the slow reply, here’s the output:

java version 1.7.0_55
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

What behaviour should I expect to see if I started using 2.1.x instead? Should 
I see a closer correlation of reported memory usage for memtables to what I 
have configured since there should be less overhead?

--
Johan Idrén



On 5 Jun 2014 at 16:46:48, Benedict Elliott Smith 
(belliottsm...@datastax.commailto:belliottsm...@datastax.com) wrote:

What does

/usr/java/latest/bin/java -version

print?


On 5 June 2014 08:15, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:
I’m using the datastax rpms, using the bundled launch scripts.

grep -i jamm *
cassandra-env.sh:# add the jamm javaagent
cassandra-env.sh:JVM_OPTS=$JVM_OPTS 
-javaagent:$CASSANDRA_HOME/lib/jamm-0.2.5.jar”

And it’s part of the commandline used to start cassandra:

/usr/java/latest/bin/java -ea 
-javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G -Xmn2400M 
-XX:+HeapDumpOnOutOfMemoryError -Xss240k -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSIncrementalMode -XX:+UseCondCardMark 
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true 
-Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.7.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.7.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.7.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.3.jar
 org.apache.cassandra.service.CassandraDaemon



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday 4 June 2014 17:18

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

In that case I would assume the problem is that for some reason JAMM is failing 
to load, and so the liveRatio it would ordinarily calculate is defaulting to 10 
- are you using the bundled cassandra launch scripts?


On 4 June 2014 15:51, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:
I wasn’t supplying it, I was assuming it was using the default. It does not 
exist in my config file. Sorry for the confusion.



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday 4 June 2014 16:36
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org

Subject: Re: memtable mem usage off by 10?

Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims that the property is around in 2.0.

But something else is wrong, as Cassandra will crash if you supply an invalid 
property, implying it's not sourcing the config file you're using.

I'm afraid I don't have the context for why it was removed, but it happened as 
part

Re: memtable mem usage off by 10?

2014-06-05 Thread Idrén , Johan
I’m using the datastax rpms, using the bundled launch scripts.

grep -i jamm *
cassandra-env.sh:# add the jamm javaagent
cassandra-env.sh:JVM_OPTS=$JVM_OPTS 
-javaagent:$CASSANDRA_HOME/lib/jamm-0.2.5.jar”

And it’s part of the commandline used to start cassandra:

/usr/java/latest/bin/java -ea 
-javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G -Xmn2400M 
-XX:+HeapDumpOnOutOfMemoryError -Xss240k -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSIncrementalMode -XX:+UseCondCardMark 
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true 
-Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.7.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.7.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.7.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.3.jar
 org.apache.cassandra.service.CassandraDaemon



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday 4 June 2014 17:18
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

In that case I would assume the problem is that for some reason JAMM is failing 
to load, and so the liveRatio it would ordinarily calculate is defaulting to 10 
- are you using the bundled cassandra launch scripts?


On 4 June 2014 15:51, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:
I wasn’t supplying it, I was assuming it was using the default. It does not 
exist in my config file. Sorry for the confusion.



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday 4 June 2014 16:36
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org

Subject: Re: memtable mem usage off by 10?

Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims that the property is around in 2.0.

But something else is wrong, as Cassandra will crash if you supply an invalid 
property, implying it's not sourcing the config file you're using.

I'm afraid I don't have the context for why it was removed, but it happened as 
part of the 2.0 release.


On 4 June 2014 13:59, Jack Krupansky 
j...@basetechnology.commailto:j...@basetechnology.com wrote:
Yeah, it is in the doc:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html

And I don’t find a Jira issue mentioning it being removed, so... what’s the 
full story there?!

-- Jack Krupansky

From: Idrén, Johanmailto:johan.id...@dice.se
Sent: Wednesday, June 4, 2014 8:26 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: memtable mem usage off by 10?


Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims

Re: memtable mem usage off by 10?

2014-06-05 Thread Benedict Elliott Smith
What does

/usr/java/latest/bin/java -version

print?


On 5 June 2014 08:15, Idrén, Johan johan.id...@dice.se wrote:

  I’m using the datastax rpms, using the bundled launch scripts.

  grep -i jamm *
 cassandra-env.sh:# add the jamm javaagent
 cassandra-env.sh:JVM_OPTS=$JVM_OPTS
 -javaagent:$CASSANDRA_HOME/lib/jamm-0.2.5.jar”

  And it’s part of the commandline used to start cassandra:

  /usr/java/latest/bin/java -ea
 -javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G
 -Xmn2400M -XX:+HeapDumpOnOutOfMemoryError -Xss240k -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSIncrementalMode
 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true
 -Dcom.sun.management.jmxremote.port=7199
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true
 -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp
 /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.7.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.7.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.7.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.3.jar
 org.apache.cassandra.service.CassandraDaemon



   From: Benedict Elliott Smith belliottsm...@datastax.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wednesday 4 June 2014 17:18

 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: memtable mem usage off by 10?

   In that case I would assume the problem is that for some reason JAMM is
 failing to load, and so the liveRatio it would ordinarily calculate is
 defaulting to 10 - are you using the bundled cassandra launch scripts?


 On 4 June 2014 15:51, Idrén, Johan johan.id...@dice.se wrote:

  I wasn’t supplying it, I was assuming it was using the default. It does
 not exist in my config file. Sorry for the confusion.



   From: Benedict Elliott Smith belliottsm...@datastax.com
  Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wednesday 4 June 2014 16:36
 To: user@cassandra.apache.org user@cassandra.apache.org

 Subject: Re: memtable mem usage off by 10?

Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry,
 I was going by the documentation. It claims that the property is around in
 2.0.

 But something else is wrong, as Cassandra will crash if you supply an
 invalid property, implying it's not sourcing the config file you're using.
  I'm afraid I don't have the context for why it was removed, but it
 happened as part of the 2.0 release.



 On 4 June 2014 13:59, Jack Krupansky j...@basetechnology.com wrote:

   Yeah, it is in the doc:

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html

 And I don’t find a Jira issue mentioning it being removed, so... what’s
 the full story there?!

 -- Jack Krupansky

  *From:* Idrén, Johan johan.id...@dice.se
 *Sent:* Wednesday, June 4, 2014 8:26 AM
 *To:* user@cassandra.apache.org
 *Subject:* RE: memtable mem usage off by 10?


 Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I
 was going by the documentation. It claims that the property is around in
 2.0.



 If we skip that, part of my reply still makes sense:



 Having memtable_total_size_in_mb set to 20480, memtables are flushed at
 a reported value of ~2GB.



 With a constant overhead of ~10x, as suggested

Re: memtable mem usage off by 10?

2014-06-04 Thread Benedict Elliott Smith
If you are storing small values in your columns, the object overhead is
very substantial. So what is 400Mb on disk may well be 4Gb in memtables, so
if you are measuring the memtable size by the resulting sstable size, you
are not getting an accurate picture. This overhead has been reduced by
about 90% in the upcoming 2.1 release, through tickets 6271
https://issues.apache.org/jira/browse/CASSANDRA-6271, 6689
https://issues.apache.org/jira/browse/CASSANDRA-6689 and 6694
https://issues.apache.org/jira/browse/CASSANDRA-6694.


On 4 June 2014 10:49, Idrén, Johan johan.id...@dice.se wrote:

  Hi,


  I'm seeing some strange behavior of the memtables, both in 1.2.13 and
 2.0.7, basically it looks like it's using 10x less memory than it should
 based on the documentation and options.


  10GB heap for both clusters.

 1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb
 before flushing


  2.0.7, same but 1/4 and ~250mb


  In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096, which
 then allowed cassandra to use up to ~400mb for memtables...


  I'm now running with 20480 for memtable_total_space_in_mb and cassandra
 is using ~2GB for memtables.


  Soo, off by 10 somewhere? Has anyone else seen this? Can't find a JIRA
 for any bug connected to this.

 java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)


  BR

 Johan



RE: memtable mem usage off by 10?

2014-06-04 Thread Idrén , Johan
I'm not measuring memtable size by looking at the sstables on disk, no. I'm 
looking through the JMX data. So I would believe (or hope) that I'm getting 
relevant data.


If I have a heap of 10GB and set the memtable usage to 20GB, I would expect to 
hit other problems, but I'm not seeing memory usage over 10GB for the heap, and 
the machine (which has ~30gb of memory) is showing ~10GB free, with ~12GB used 
by cassandra, the rest in caches.


Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not idling.


BR

Johan



From: Benedict Elliott Smith belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 11:56 AM
To: user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

If you are storing small values in your columns, the object overhead is very 
substantial. So what is 400Mb on disk may well be 4Gb in memtables, so if you 
are measuring the memtable size by the resulting sstable size, you are not 
getting an accurate picture. This overhead has been reduced by about 90% in the 
upcoming 2.1 release, through tickets 
6271https://issues.apache.org/jira/browse/CASSANDRA-6271, 
6689https://issues.apache.org/jira/browse/CASSANDRA-6689 and 
6694https://issues.apache.org/jira/browse/CASSANDRA-6694.


On 4 June 2014 10:49, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

Hi,


I'm seeing some strange behavior of the memtables, both in 1.2.13 and 2.0.7, 
basically it looks like it's using 10x less memory than it should based on the 
documentation and options.


10GB heap for both clusters.

1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb before 
flushing


2.0.7, same but 1/4 and ~250mb


In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096, which then 
allowed cassandra to use up to ~400mb for memtables...


I'm now running with 20480 for memtable_total_space_in_mb and cassandra is 
using ~2GB for memtables.


Soo, off by 10 somewhere? Has anyone else seen this? Can't find a JIRA for any 
bug connected to this.

java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)


BR

Johan



Re: memtable mem usage off by 10?

2014-06-04 Thread Benedict Elliott Smith
These measurements tell you the amount of user data stored in the
memtables, not the amount of heap used to store it, so the same applies.


On 4 June 2014 11:04, Idrén, Johan johan.id...@dice.se wrote:

  I'm not measuring memtable size by looking at the sstables on disk, no.
 I'm looking through the JMX data. So I would believe (or hope) that I'm
 getting relevant data.


  If I have a heap of 10GB and set the memtable usage to 20GB, I would
 expect to hit other problems, but I'm not seeing memory usage over 10GB for
 the heap, and the machine (which has ~30gb of memory) is showing ~10GB
 free, with ~12GB used by cassandra, the rest in caches.


  Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not
 idling.


  BR

 Johan


  --
 *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 11:56 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

  If you are storing small values in your columns, the object overhead is
 very substantial. So what is 400Mb on disk may well be 4Gb in memtables, so
 if you are measuring the memtable size by the resulting sstable size, you
 are not getting an accurate picture. This overhead has been reduced by
 about 90% in the upcoming 2.1 release, through tickets 6271
 https://issues.apache.org/jira/browse/CASSANDRA-6271, 6689
 https://issues.apache.org/jira/browse/CASSANDRA-6689 and 6694
 https://issues.apache.org/jira/browse/CASSANDRA-6694.


 On 4 June 2014 10:49, Idrén, Johan johan.id...@dice.se wrote:

  Hi,


  I'm seeing some strange behavior of the memtables, both in 1.2.13 and
 2.0.7, basically it looks like it's using 10x less memory than it should
 based on the documentation and options.


  10GB heap for both clusters.

 1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb
 before flushing


  2.0.7, same but 1/4 and ~250mb


  In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096,
 which then allowed cassandra to use up to ~400mb for memtables...


  I'm now running with 20480 for memtable_total_space_in_mb and cassandra
 is using ~2GB for memtables.


  Soo, off by 10 somewhere? Has anyone else seen this? Can't find a JIRA
 for any bug connected to this.

 java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)


  BR

 Johan





RE: memtable mem usage off by 10?

2014-06-04 Thread Idrén , Johan
Aha, ok. Thanks.


Trying to understand what my cluster is doing:


cassandra.db.memtable_data_size only gets me the actual data but not the 
memtable heap memory usage. Is there a way to check for heap memory usage?


I would expect to hit the flush_largest_memtables_at value, and this would be 
what causes the memtable flush to sstable then? By default 0.75?


Then I would expect the amount of memory to be used to be maximum ~3x of what I 
was seeing when I hadn't set memtable_total_space_in_mb (1/4 by default, max 
3/4 before a flush), instead of close to 10x (250mb vs 2gb).

This is of course assuming that the overhead scales linearly with the amount of 
data in my table, we're using one table with three cells in this case. If it 
hardly increases at all, then I'll give up I guess :)

At least until 2.1.0 comes out and I can compare.


BR

Johan



From: Benedict Elliott Smith belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 12:33 PM
To: user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

These measurements tell you the amount of user data stored in the memtables, 
not the amount of heap used to store it, so the same applies.


On 4 June 2014 11:04, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

I'm not measuring memtable size by looking at the sstables on disk, no. I'm 
looking through the JMX data. So I would believe (or hope) that I'm getting 
relevant data.


If I have a heap of 10GB and set the memtable usage to 20GB, I would expect to 
hit other problems, but I'm not seeing memory usage over 10GB for the heap, and 
the machine (which has ~30gb of memory) is showing ~10GB free, with ~12GB used 
by cassandra, the rest in caches.


Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not idling.


BR

Johan



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 11:56 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

If you are storing small values in your columns, the object overhead is very 
substantial. So what is 400Mb on disk may well be 4Gb in memtables, so if you 
are measuring the memtable size by the resulting sstable size, you are not 
getting an accurate picture. This overhead has been reduced by about 90% in the 
upcoming 2.1 release, through tickets 
6271https://issues.apache.org/jira/browse/CASSANDRA-6271, 
6689https://issues.apache.org/jira/browse/CASSANDRA-6689 and 
6694https://issues.apache.org/jira/browse/CASSANDRA-6694.


On 4 June 2014 10:49, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

Hi,


I'm seeing some strange behavior of the memtables, both in 1.2.13 and 2.0.7, 
basically it looks like it's using 10x less memory than it should based on the 
documentation and options.


10GB heap for both clusters.

1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb before 
flushing


2.0.7, same but 1/4 and ~250mb


In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096, which then 
allowed cassandra to use up to ~400mb for memtables...


I'm now running with 20480 for memtable_total_space_in_mb and cassandra is 
using ~2GB for memtables.


Soo, off by 10 somewhere? Has anyone else seen this? Can't find a JIRA for any 
bug connected to this.

java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)


BR

Johan




Re: memtable mem usage off by 10?

2014-06-04 Thread Benedict Elliott Smith
Unfortunately it looks like the heap utilisation of memtables was not
exposed in earlier versions, because they only maintained an estimate.

The overhead scales linearly with the amount of data in your memtables
(assuming the size of each cell is approx. constant).

flush_largest_memtables_at is an independent setting to
memtable_total_space_in_mb, and generally has little effect. Ordinarily
sstable flushes are triggered by hitting the memtable_total_space_in_mb
limit. I'm afraid I don't follow where your 3x comes from?


On 4 June 2014 12:04, Idrén, Johan johan.id...@dice.se wrote:

  Aha, ok. Thanks.


  Trying to understand what my cluster is doing:


  cassandra.db.memtable_data_size only gets me the actual data but not the
 memtable heap memory usage. Is there a way to check for heap memory usage?


  I would expect to hit the flush_largest_memtables_at value, and this
 would be what causes the memtable flush to sstable then? By default 0.75?


  Then I would expect the amount of memory to be used to be maximum ~3x of
 what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by
 default, max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).


 This is of course assuming that the overhead scales linearly with the
 amount of data in my table, we're using one table with three cells in this
 case. If it hardly increases at all, then I'll give up I guess :)

 At least until 2.1.0 comes out and I can compare.


  BR

 Johan


  --
 *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 12:33 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

  These measurements tell you the amount of user data stored in the
 memtables, not the amount of heap used to store it, so the same applies.


 On 4 June 2014 11:04, Idrén, Johan johan.id...@dice.se wrote:

  I'm not measuring memtable size by looking at the sstables on disk, no.
 I'm looking through the JMX data. So I would believe (or hope) that I'm
 getting relevant data.


  If I have a heap of 10GB and set the memtable usage to 20GB, I would
 expect to hit other problems, but I'm not seeing memory usage over 10GB for
 the heap, and the machine (which has ~30gb of memory) is showing ~10GB
 free, with ~12GB used by cassandra, the rest in caches.


  Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not
 idling.


  BR

 Johan


  --
 *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 11:56 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

   If you are storing small values in your columns, the object overhead
 is very substantial. So what is 400Mb on disk may well be 4Gb in memtables,
 so if you are measuring the memtable size by the resulting sstable size,
 you are not getting an accurate picture. This overhead has been reduced by
 about 90% in the upcoming 2.1 release, through tickets 6271
 https://issues.apache.org/jira/browse/CASSANDRA-6271, 6689
 https://issues.apache.org/jira/browse/CASSANDRA-6689 and 6694
 https://issues.apache.org/jira/browse/CASSANDRA-6694.


 On 4 June 2014 10:49, Idrén, Johan johan.id...@dice.se wrote:

  Hi,


  I'm seeing some strange behavior of the memtables, both in 1.2.13 and
 2.0.7, basically it looks like it's using 10x less memory than it should
 based on the documentation and options.


  10GB heap for both clusters.

 1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb
 before flushing


  2.0.7, same but 1/4 and ~250mb


  In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096,
 which then allowed cassandra to use up to ~400mb for memtables...


  I'm now running with 20480 for memtable_total_space_in_mb and
 cassandra is using ~2GB for memtables.


  Soo, off by 10 somewhere? Has anyone else seen this? Can't find a JIRA
 for any bug connected to this.

 java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)


  BR

 Johan






RE: memtable mem usage off by 10?

2014-06-04 Thread Idrén , Johan
Ok, so the overhead is a constant modifier, right.


The 3x I arrived at with the following assumptions:


heap is 10GB

Default memory for memtable usage is 1/4 of heap in c* 2.0

max memory used for memtables is 2,5GB (10/4)

flush_largest_memtables_at is 0.75

flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the 
default)


With an overhead of 10x, it makes sense that my memtable is flushed when the 
jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap


After I've set the memtable_total_size_in_mb to a value larger than 7,5GB, it 
should still not go over 7,5GB on account of flush_largest_memtables_at, 3/4 
the heap


So I would expect to see memtables flushed to disk after they're being 
reportedly at around 750MB.


Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB.


With a constant overhead, this would mean that it used 20GB, which is 2x the 
size of the heap, instead of 3/4 of the heap as it should be if 
flush_largest_memtables_at was being respected.


This shouldn't be possible.



From: Benedict Elliott Smith belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 1:19 PM
To: user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

Unfortunately it looks like the heap utilisation of memtables was not exposed 
in earlier versions, because they only maintained an estimate.

The overhead scales linearly with the amount of data in your memtables 
(assuming the size of each cell is approx. constant).

flush_largest_memtables_at is an independent setting to 
memtable_total_space_in_mb, and generally has little effect. Ordinarily sstable 
flushes are triggered by hitting the memtable_total_space_in_mb limit. I'm 
afraid I don't follow where your 3x comes from?


On 4 June 2014 12:04, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

Aha, ok. Thanks.


Trying to understand what my cluster is doing:


cassandra.db.memtable_data_size only gets me the actual data but not the 
memtable heap memory usage. Is there a way to check for heap memory usage?


I would expect to hit the flush_largest_memtables_at value, and this would be 
what causes the memtable flush to sstable then? By default 0.75?


Then I would expect the amount of memory to be used to be maximum ~3x of what I 
was seeing when I hadn't set memtable_total_space_in_mb (1/4 by default, max 
3/4 before a flush), instead of close to 10x (250mb vs 2gb).

This is of course assuming that the overhead scales linearly with the amount of 
data in my table, we're using one table with three cells in this case. If it 
hardly increases at all, then I'll give up I guess :)

At least until 2.1.0 comes out and I can compare.


BR

Johan



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 12:33 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

These measurements tell you the amount of user data stored in the memtables, 
not the amount of heap used to store it, so the same applies.


On 4 June 2014 11:04, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

I'm not measuring memtable size by looking at the sstables on disk, no. I'm 
looking through the JMX data. So I would believe (or hope) that I'm getting 
relevant data.


If I have a heap of 10GB and set the memtable usage to 20GB, I would expect to 
hit other problems, but I'm not seeing memory usage over 10GB for the heap, and 
the machine (which has ~30gb of memory) is showing ~10GB free, with ~12GB used 
by cassandra, the rest in caches.


Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not idling.


BR

Johan



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 11:56 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

If you are storing small values in your columns, the object overhead is very 
substantial. So what is 400Mb on disk may well be 4Gb in memtables, so if you 
are measuring the memtable size by the resulting sstable size, you are not 
getting an accurate picture. This overhead has been reduced by about 90% in the 
upcoming 2.1 release, through tickets 
6271https://issues.apache.org/jira/browse/CASSANDRA-6271, 
6689https://issues.apache.org/jira/browse/CASSANDRA-6689 and 
6694https://issues.apache.org/jira/browse/CASSANDRA-6694.


On 4 June 2014 10:49, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

Hi,


I'm seeing some strange behavior of the memtables, both in 1.2.13 and 2.0.7, 
basically it looks like it's using 10x less memory than it should based on the 
documentation and options.


10GB heap for both clusters.

1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb

Re: memtable mem usage off by 10?

2014-06-04 Thread Benedict Elliott Smith
I'm confused: there is no flush_largest_memtables_at property in C* 2.0?


On 4 June 2014 12:55, Idrén, Johan johan.id...@dice.se wrote:

  Ok, so the overhead is a constant modifier, right.


  The 3x I arrived at with the following assumptions:


  heap is 10GB

 Default memory for memtable usage is 1/4 of heap in c* 2.0
  max memory used for memtables is 2,5GB (10/4)

 flush_largest_memtables_at is 0.75

 flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the
 default)


  With an overhead of 10x, it makes sense that my memtable is flushed when
 the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap


  After I've set the memtable_total_size_in_mb to a value larger than
 7,5GB, it should still not go over 7,5GB on account of
 flush_largest_memtables_at, 3/4 the heap


  So I would expect to see memtables flushed to disk after they're being
 reportedly at around 750MB.


  Having memtable_total_size_in_mb set to 20480, memtables are flushed at
 a reported value of ~2GB.


  With a constant overhead, this would mean that it used 20GB, which is 2x
 the size of the heap, instead of 3/4 of the heap as it should be if
 flush_largest_memtables_at was being respected.


  This shouldn't be possible.


  --
 *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 1:19 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

  Unfortunately it looks like the heap utilisation of memtables was not
 exposed in earlier versions, because they only maintained an estimate.

  The overhead scales linearly with the amount of data in your memtables
 (assuming the size of each cell is approx. constant).

  flush_largest_memtables_at is an independent setting to
 memtable_total_space_in_mb, and generally has little effect. Ordinarily
 sstable flushes are triggered by hitting the memtable_total_space_in_mb
 limit. I'm afraid I don't follow where your 3x comes from?


 On 4 June 2014 12:04, Idrén, Johan johan.id...@dice.se wrote:

  Aha, ok. Thanks.


  Trying to understand what my cluster is doing:


  cassandra.db.memtable_data_size only gets me the actual data but not
 the memtable heap memory usage. Is there a way to check for heap memory
 usage?


  I would expect to hit the flush_largest_memtables_at value, and this
 would be what causes the memtable flush to sstable then? By default 0.75?


  Then I would expect the amount of memory to be used to be maximum ~3x
 of what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by
 default, max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).


 This is of course assuming that the overhead scales linearly with the
 amount of data in my table, we're using one table with three cells in this
 case. If it hardly increases at all, then I'll give up I guess :)

 At least until 2.1.0 comes out and I can compare.


  BR

 Johan


  --
  *From:* Benedict Elliott Smith belliottsm...@datastax.com
  *Sent:* Wednesday, June 4, 2014 12:33 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

   These measurements tell you the amount of user data stored in the
 memtables, not the amount of heap used to store it, so the same applies.


 On 4 June 2014 11:04, Idrén, Johan johan.id...@dice.se wrote:

  I'm not measuring memtable size by looking at the sstables on disk,
 no. I'm looking through the JMX data. So I would believe (or hope) that I'm
 getting relevant data.


  If I have a heap of 10GB and set the memtable usage to 20GB, I would
 expect to hit other problems, but I'm not seeing memory usage over 10GB for
 the heap, and the machine (which has ~30gb of memory) is showing ~10GB
 free, with ~12GB used by cassandra, the rest in caches.


  Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not
 idling.


  BR

 Johan


  --
 *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 11:56 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

   If you are storing small values in your columns, the object overhead
 is very substantial. So what is 400Mb on disk may well be 4Gb in memtables,
 so if you are measuring the memtable size by the resulting sstable size,
 you are not getting an accurate picture. This overhead has been reduced by
 about 90% in the upcoming 2.1 release, through tickets 6271
 https://issues.apache.org/jira/browse/CASSANDRA-6271, 6689
 https://issues.apache.org/jira/browse/CASSANDRA-6689 and 6694
 https://issues.apache.org/jira/browse/CASSANDRA-6694.


 On 4 June 2014 10:49, Idrén, Johan johan.id...@dice.se wrote:

  Hi,


  I'm seeing some strange behavior of the memtables, both in 1.2.13 and
 2.0.7, basically it looks like it's using 10x less memory than it should
 based on the documentation and options.


  10GB heap for both clusters.

 1.2.x should use 1/3 of the heap

RE: memtable mem usage off by 10?

2014-06-04 Thread Idrén , Johan
Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims that the property is around in 2.0.


If we skip that, part of my reply still makes sense:


Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB.


With a constant overhead of ~10x, as suggested, this would mean that it used 
20GB, which is 2x the size of the heap.


That shouldn't work. According to the OS, cassandra doesn't use more than 
~11-12GB.



From: Benedict Elliott Smith belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 2:07 PM
To: user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

I'm confused: there is no flush_largest_memtables_at property in C* 2.0?


On 4 June 2014 12:55, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

Ok, so the overhead is a constant modifier, right.


The 3x I arrived at with the following assumptions:


heap is 10GB

Default memory for memtable usage is 1/4 of heap in c* 2.0

max memory used for memtables is 2,5GB (10/4)

flush_largest_memtables_at is 0.75

flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the 
default)


With an overhead of 10x, it makes sense that my memtable is flushed when the 
jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap


After I've set the memtable_total_size_in_mb to a value larger than 7,5GB, it 
should still not go over 7,5GB on account of flush_largest_memtables_at, 3/4 
the heap


So I would expect to see memtables flushed to disk after they're being 
reportedly at around 750MB.


Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB.


With a constant overhead, this would mean that it used 20GB, which is 2x the 
size of the heap, instead of 3/4 of the heap as it should be if 
flush_largest_memtables_at was being respected.


This shouldn't be possible.



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 1:19 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

Unfortunately it looks like the heap utilisation of memtables was not exposed 
in earlier versions, because they only maintained an estimate.

The overhead scales linearly with the amount of data in your memtables 
(assuming the size of each cell is approx. constant).

flush_largest_memtables_at is an independent setting to 
memtable_total_space_in_mb, and generally has little effect. Ordinarily sstable 
flushes are triggered by hitting the memtable_total_space_in_mb limit. I'm 
afraid I don't follow where your 3x comes from?


On 4 June 2014 12:04, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

Aha, ok. Thanks.


Trying to understand what my cluster is doing:


cassandra.db.memtable_data_size only gets me the actual data but not the 
memtable heap memory usage. Is there a way to check for heap memory usage?


I would expect to hit the flush_largest_memtables_at value, and this would be 
what causes the memtable flush to sstable then? By default 0.75?


Then I would expect the amount of memory to be used to be maximum ~3x of what I 
was seeing when I hadn't set memtable_total_space_in_mb (1/4 by default, max 
3/4 before a flush), instead of close to 10x (250mb vs 2gb).

This is of course assuming that the overhead scales linearly with the amount of 
data in my table, we're using one table with three cells in this case. If it 
hardly increases at all, then I'll give up I guess :)

At least until 2.1.0 comes out and I can compare.


BR

Johan



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 12:33 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

These measurements tell you the amount of user data stored in the memtables, 
not the amount of heap used to store it, so the same applies.


On 4 June 2014 11:04, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

I'm not measuring memtable size by looking at the sstables on disk, no. I'm 
looking through the JMX data. So I would believe (or hope) that I'm getting 
relevant data.


If I have a heap of 10GB and set the memtable usage to 20GB, I would expect to 
hit other problems, but I'm not seeing memory usage over 10GB for the heap, and 
the machine (which has ~30gb of memory) is showing ~10GB free, with ~12GB used 
by cassandra, the rest in caches.


Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not idling.


BR

Johan



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 11:56 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re

Re: memtable mem usage off by 10?

2014-06-04 Thread Jack Krupansky
Yeah, it is in the doc:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html

And I don’t find a Jira issue mentioning it being removed, so... what’s the 
full story there?!

-- Jack Krupansky

From: Idrén, Johan 
Sent: Wednesday, June 4, 2014 8:26 AM
To: user@cassandra.apache.org 
Subject: RE: memtable mem usage off by 10?

Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims that the property is around in 2.0.




If we skip that, part of my reply still makes sense:




Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB. 




With a constant overhead of ~10x, as suggested, this would mean that it used 
20GB, which is 2x the size of the heap.




That shouldn't work. According to the OS, cassandra doesn't use more than 
~11-12GB.







From: Benedict Elliott Smith belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 2:07 PM
To: user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10? 

I'm confused: there is no flush_largest_memtables_at property in C* 2.0?



On 4 June 2014 12:55, Idrén, Johan johan.id...@dice.se wrote:

  Ok, so the overhead is a constant modifier, right.




  The 3x I arrived at with the following assumptions:




  heap is 10GB


  Default memory for memtable usage is 1/4 of heap in c* 2.0


  max memory used for memtables is 2,5GB (10/4)
  flush_largest_memtables_at is 0.75


  flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the 
default)




  With an overhead of 10x, it makes sense that my memtable is flushed when the 
jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap




  After I've set the memtable_total_size_in_mb to a value larger than 7,5GB, it 
should still not go over 7,5GB on account of flush_largest_memtables_at, 3/4 
the heap




  So I would expect to see memtables flushed to disk after they're being 
reportedly at around 750MB.




  Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB. 




  With a constant overhead, this would mean that it used 20GB, which is 2x the 
size of the heap, instead of 3/4 of the heap as it should be if 
flush_largest_memtables_at was being respected.




  This shouldn't be possible.





--

  From: Benedict Elliott Smith belliottsm...@datastax.com

  Sent: Wednesday, June 4, 2014 1:19 PM 

  To: user@cassandra.apache.org
  Subject: Re: memtable mem usage off by 10?

  Unfortunately it looks like the heap utilisation of memtables was not exposed 
in earlier versions, because they only maintained an estimate. 

  The overhead scales linearly with the amount of data in your memtables 
(assuming the size of each cell is approx. constant). 

  flush_largest_memtables_at is an independent setting to 
memtable_total_space_in_mb, and generally has little effect. Ordinarily sstable 
flushes are triggered by hitting the memtable_total_space_in_mb limit. I'm 
afraid I don't follow where your 3x comes from?




  On 4 June 2014 12:04, Idrén, Johan johan.id...@dice.se wrote:

Aha, ok. Thanks.




Trying to understand what my cluster is doing:




cassandra.db.memtable_data_size only gets me the actual data but not the 
memtable heap memory usage. Is there a way to check for heap memory usage?





I would expect to hit the flush_largest_memtables_at value, and this would 
be what causes the memtable flush to sstable then? By default 0.75?




Then I would expect the amount of memory to be used to be maximum ~3x of 
what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by default, 
max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).


This is of course assuming that the overhead scales linearly with the 
amount of data in my table, we're using one table with three cells in this 
case. If it hardly increases at all, then I'll give up I guess :)

At least until 2.1.0 comes out and I can compare.




BR

Johan






From: Benedict Elliott Smith belliottsm...@datastax.com

Sent: Wednesday, June 4, 2014 12:33 PM 

To: user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

These measurements tell you the amount of user data stored in the 
memtables, not the amount of heap used to store it, so the same applies.



On 4 June 2014 11:04, Idrén, Johan johan.id...@dice.se wrote:

  I'm not measuring memtable size by looking at the sstables on disk, no. 
I'm looking through the JMX data. So I would believe (or hope) that I'm getting 
relevant data. 




  If I have a heap of 10GB and set the memtable usage to 20GB, I would 
expect to hit other problems, but I'm not seeing

Re: memtable mem usage off by 10?

2014-06-04 Thread Benedict Elliott Smith

 Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I
 was going by the documentation. It claims that the property is around in
 2.0.

But something else is wrong, as Cassandra will crash if you supply an
invalid property, implying it's not sourcing the config file you're using.
I'm afraid I don't have the context for why it was removed, but it happened
as part of the 2.0 release.



On 4 June 2014 13:59, Jack Krupansky j...@basetechnology.com wrote:

   Yeah, it is in the doc:

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html

 And I don’t find a Jira issue mentioning it being removed, so... what’s
 the full story there?!

 -- Jack Krupansky

  *From:* Idrén, Johan johan.id...@dice.se
 *Sent:* Wednesday, June 4, 2014 8:26 AM
 *To:* user@cassandra.apache.org
 *Subject:* RE: memtable mem usage off by 10?


 Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I
 was going by the documentation. It claims that the property is around in
 2.0.



 If we skip that, part of my reply still makes sense:



 Having memtable_total_size_in_mb set to 20480, memtables are flushed at a
 reported value of ~2GB.



 With a constant overhead of ~10x, as suggested, this would mean that it
 used 20GB, which is 2x the size of the heap.



 That shouldn't work. According to the OS, cassandra doesn't use more than
 ~11-12GB.


  --
 *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 2:07 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

  I'm confused: there is no flush_largest_memtables_at property in C* 2.0?


 On 4 June 2014 12:55, Idrén, Johan johan.id...@dice.se wrote:

  Ok, so the overhead is a constant modifier, right.



 The 3x I arrived at with the following assumptions:



 heap is 10GB

 Default memory for memtable usage is 1/4 of heap in c* 2.0
 max memory used for memtables is 2,5GB (10/4)

 flush_largest_memtables_at is 0.75

 flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the
 default)



 With an overhead of 10x, it makes sense that my memtable is flushed when
 the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap



 After I've set the memtable_total_size_in_mb to a value larger than
 7,5GB, it should still not go over 7,5GB on account of
 flush_largest_memtables_at, 3/4 the heap



 So I would expect to see memtables flushed to disk after they're being
 reportedly at around 750MB.



 Having memtable_total_size_in_mb set to 20480, memtables are flushed at a
 reported value of ~2GB.



 With a constant overhead, this would mean that it used 20GB, which is 2x
 the size of the heap, instead of 3/4 of the heap as it should be if
 flush_largest_memtables_at was being respected.



 This shouldn't be possible.


  --
  *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 1:19 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

   Unfortunately it looks like the heap utilisation of memtables was not
 exposed in earlier versions, because they only maintained an estimate.

 The overhead scales linearly with the amount of data in your memtables
 (assuming the size of each cell is approx. constant).

 flush_largest_memtables_at is an independent setting to
 memtable_total_space_in_mb, and generally has little effect. Ordinarily
 sstable flushes are triggered by hitting the memtable_total_space_in_mb
 limit. I'm afraid I don't follow where your 3x comes from?


 On 4 June 2014 12:04, Idrén, Johan johan.id...@dice.se wrote:

  Aha, ok. Thanks.



 Trying to understand what my cluster is doing:



 cassandra.db.memtable_data_size only gets me the actual data but not
 the memtable heap memory usage. Is there a way to check for heap memory
 usage?


 I would expect to hit the flush_largest_memtables_at value, and this
 would be what causes the memtable flush to sstable then? By default 0.75?


 Then I would expect the amount of memory to be used to be maximum ~3x of
 what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by
 default, max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).


 This is of course assuming that the overhead scales linearly with the
 amount of data in my table, we're using one table with three cells in this
 case. If it hardly increases at all, then I'll give up I guess :)

 At least until 2.1.0 comes out and I can compare.


 BR

 Johan


  --
  *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 12:33 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

   These measurements tell you the amount of user data stored in the
 memtables, not the amount of heap used to store it, so the same applies.


 On 4 June 2014 11:04, Idrén, Johan johan.id...@dice.se wrote

Re: memtable mem usage off by 10?

2014-06-04 Thread Idrén , Johan
I wasn’t supplying it, I was assuming it was using the default. It does not 
exist in my config file. Sorry for the confusion.



From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday 4 June 2014 16:36
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims that the property is around in 2.0.

But something else is wrong, as Cassandra will crash if you supply an invalid 
property, implying it's not sourcing the config file you're using.

I'm afraid I don't have the context for why it was removed, but it happened as 
part of the 2.0 release.


On 4 June 2014 13:59, Jack Krupansky 
j...@basetechnology.commailto:j...@basetechnology.com wrote:
Yeah, it is in the doc:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html

And I don’t find a Jira issue mentioning it being removed, so... what’s the 
full story there?!

-- Jack Krupansky

From: Idrén, Johanmailto:johan.id...@dice.se
Sent: Wednesday, June 4, 2014 8:26 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: memtable mem usage off by 10?


Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims that the property is around in 2.0.



If we skip that, part of my reply still makes sense:



Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB.



With a constant overhead of ~10x, as suggested, this would mean that it used 
20GB, which is 2x the size of the heap.



That shouldn't work. According to the OS, cassandra doesn't use more than 
~11-12GB.




From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 2:07 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

I'm confused: there is no flush_largest_memtables_at property in C* 2.0?


On 4 June 2014 12:55, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

Ok, so the overhead is a constant modifier, right.



The 3x I arrived at with the following assumptions:



heap is 10GB

Default memory for memtable usage is 1/4 of heap in c* 2.0

max memory used for memtables is 2,5GB (10/4)

flush_largest_memtables_at is 0.75

flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the 
default)



With an overhead of 10x, it makes sense that my memtable is flushed when the 
jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap



After I've set the memtable_total_size_in_mb to a value larger than 7,5GB, it 
should still not go over 7,5GB on account of flush_largest_memtables_at, 3/4 
the heap



So I would expect to see memtables flushed to disk after they're being 
reportedly at around 750MB.



Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB.



With a constant overhead, this would mean that it used 20GB, which is 2x the 
size of the heap, instead of 3/4 of the heap as it should be if 
flush_largest_memtables_at was being respected.



This shouldn't be possible.




From: Benedict Elliott Smith 
belliottsm...@datastax.commailto:belliottsm...@datastax.com
Sent: Wednesday, June 4, 2014 1:19 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

Unfortunately it looks like the heap utilisation of memtables was not exposed 
in earlier versions, because they only maintained an estimate.

The overhead scales linearly with the amount of data in your memtables 
(assuming the size of each cell is approx. constant).

flush_largest_memtables_at is an independent setting to 
memtable_total_space_in_mb, and generally has little effect. Ordinarily sstable 
flushes are triggered by hitting the memtable_total_space_in_mb limit. I'm 
afraid I don't follow where your 3x comes from?


On 4 June 2014 12:04, Idrén, Johan 
johan.id...@dice.semailto:johan.id...@dice.se wrote:

Aha, ok. Thanks.



Trying to understand what my cluster is doing:



cassandra.db.memtable_data_size only gets me the actual data but not the 
memtable heap memory usage. Is there a way to check for heap memory usage?


I would expect to hit the flush_largest_memtables_at value, and this would be 
what causes the memtable flush to sstable then? By default 0.75?


Then I would expect the amount of memory to be used to be maximum ~3x of what I 
was seeing when I hadn't set memtable_total_space_in_mb (1/4 by default, max 
3/4 before a flush), instead of close to 10x

Re: memtable mem usage off by 10?

2014-06-04 Thread Jack Krupansky
And sorry that the doc confused you as well!

-- Jack Krupansky

From: Idrén, Johan 
Sent: Wednesday, June 4, 2014 10:51 AM
To: user@cassandra.apache.org 
Subject: Re: memtable mem usage off by 10?

I wasn’t supplying it, I was assuming it was using the default. It does not 
exist in my config file. Sorry for the confusion.




From: Benedict Elliott Smith belliottsm...@datastax.com
Reply-To: user@cassandra.apache.org user@cassandra.apache.org
Date: Wednesday 4 June 2014 16:36
To: user@cassandra.apache.org user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?


  Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims that the property is around in 2.0.
But something else is wrong, as Cassandra will crash if you supply an invalid 
property, implying it's not sourcing the config file you're using.


I'm afraid I don't have the context for why it was removed, but it happened as 
part of the 2.0 release.



On 4 June 2014 13:59, Jack Krupansky j...@basetechnology.com wrote:

  Yeah, it is in the doc:
  
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html

  And I don’t find a Jira issue mentioning it being removed, so... what’s the 
full story there?!

  -- Jack Krupansky

  From: Idrén, Johan 
  Sent: Wednesday, June 4, 2014 8:26 AM
  To: user@cassandra.apache.org 
  Subject: RE: memtable mem usage off by 10?

  Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was 
going by the documentation. It claims that the property is around in 2.0.




  If we skip that, part of my reply still makes sense:




  Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB. 




  With a constant overhead of ~10x, as suggested, this would mean that it used 
20GB, which is 2x the size of the heap.




  That shouldn't work. According to the OS, cassandra doesn't use more than 
~11-12GB.





--

  From: Benedict Elliott Smith belliottsm...@datastax.com
  Sent: Wednesday, June 4, 2014 2:07 PM
  To: user@cassandra.apache.org
  Subject: Re: memtable mem usage off by 10? 

  I'm confused: there is no flush_largest_memtables_at property in C* 2.0?



  On 4 June 2014 12:55, Idrén, Johan johan.id...@dice.se wrote:

Ok, so the overhead is a constant modifier, right.




The 3x I arrived at with the following assumptions:




heap is 10GB


Default memory for memtable usage is 1/4 of heap in c* 2.0


max memory used for memtables is 2,5GB (10/4)
flush_largest_memtables_at is 0.75


flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the 
default)




With an overhead of 10x, it makes sense that my memtable is flushed when 
the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap




After I've set the memtable_total_size_in_mb to a value larger than 7,5GB, 
it should still not go over 7,5GB on account of flush_largest_memtables_at, 3/4 
the heap




So I would expect to see memtables flushed to disk after they're being 
reportedly at around 750MB.




Having memtable_total_size_in_mb set to 20480, memtables are flushed at a 
reported value of ~2GB. 




With a constant overhead, this would mean that it used 20GB, which is 2x 
the size of the heap, instead of 3/4 of the heap as it should be if 
flush_largest_memtables_at was being respected.




This shouldn't be possible.







From: Benedict Elliott Smith belliottsm...@datastax.com

Sent: Wednesday, June 4, 2014 1:19 PM 

To: user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

Unfortunately it looks like the heap utilisation of memtables was not 
exposed in earlier versions, because they only maintained an estimate. 

The overhead scales linearly with the amount of data in your memtables 
(assuming the size of each cell is approx. constant). 

flush_largest_memtables_at is an independent setting to 
memtable_total_space_in_mb, and generally has little effect. Ordinarily sstable 
flushes are triggered by hitting the memtable_total_space_in_mb limit. I'm 
afraid I don't follow where your 3x comes from?




On 4 June 2014 12:04, Idrén, Johan johan.id...@dice.se wrote:

  Aha, ok. Thanks.




  Trying to understand what my cluster is doing:




  cassandra.db.memtable_data_size only gets me the actual data but not the 
memtable heap memory usage. Is there a way to check for heap memory usage?





  I would expect to hit the flush_largest_memtables_at value, and this 
would be what causes the memtable flush to sstable then? By default 0.75?




  Then I would expect the amount of memory to be used to be maximum ~3x of 
what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by default, 
max 3

Re: memtable mem usage off by 10?

2014-06-04 Thread Benedict Elliott Smith
In that case I would assume the problem is that for some reason JAMM is
failing to load, and so the liveRatio it would ordinarily calculate is
defaulting to 10 - are you using the bundled cassandra launch scripts?


On 4 June 2014 15:51, Idrén, Johan johan.id...@dice.se wrote:

  I wasn’t supplying it, I was assuming it was using the default. It does
 not exist in my config file. Sorry for the confusion.



   From: Benedict Elliott Smith belliottsm...@datastax.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wednesday 4 June 2014 16:36
 To: user@cassandra.apache.org user@cassandra.apache.org

 Subject: Re: memtable mem usage off by 10?

Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry,
 I was going by the documentation. It claims that the property is around in
 2.0.

 But something else is wrong, as Cassandra will crash if you supply an
 invalid property, implying it's not sourcing the config file you're using.
  I'm afraid I don't have the context for why it was removed, but it
 happened as part of the 2.0 release.



 On 4 June 2014 13:59, Jack Krupansky j...@basetechnology.com wrote:

   Yeah, it is in the doc:

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html

 And I don’t find a Jira issue mentioning it being removed, so... what’s
 the full story there?!

 -- Jack Krupansky

  *From:* Idrén, Johan johan.id...@dice.se
 *Sent:* Wednesday, June 4, 2014 8:26 AM
 *To:* user@cassandra.apache.org
 *Subject:* RE: memtable mem usage off by 10?


 Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I
 was going by the documentation. It claims that the property is around in
 2.0.



 If we skip that, part of my reply still makes sense:



 Having memtable_total_size_in_mb set to 20480, memtables are flushed at a
 reported value of ~2GB.



 With a constant overhead of ~10x, as suggested, this would mean that it
 used 20GB, which is 2x the size of the heap.



 That shouldn't work. According to the OS, cassandra doesn't use more than
 ~11-12GB.


  --
 *From:* Benedict Elliott Smith belliottsm...@datastax.com
 *Sent:* Wednesday, June 4, 2014 2:07 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

  I'm confused: there is no flush_largest_memtables_at property in C* 2.0?


 On 4 June 2014 12:55, Idrén, Johan johan.id...@dice.se wrote:

  Ok, so the overhead is a constant modifier, right.



 The 3x I arrived at with the following assumptions:



 heap is 10GB

 Default memory for memtable usage is 1/4 of heap in c* 2.0
  max memory used for memtables is 2,5GB (10/4)

 flush_largest_memtables_at is 0.75

 flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the
 default)



 With an overhead of 10x, it makes sense that my memtable is flushed when
 the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap



 After I've set the memtable_total_size_in_mb to a value larger than
 7,5GB, it should still not go over 7,5GB on account of
 flush_largest_memtables_at, 3/4 the heap



 So I would expect to see memtables flushed to disk after they're being
 reportedly at around 750MB.



 Having memtable_total_size_in_mb set to 20480, memtables are flushed at
 a reported value of ~2GB.



 With a constant overhead, this would mean that it used 20GB, which is 2x
 the size of the heap, instead of 3/4 of the heap as it should be if
 flush_largest_memtables_at was being respected.



 This shouldn't be possible.


  --
  *From:* Benedict Elliott Smith belliottsm...@datastax.com
  *Sent:* Wednesday, June 4, 2014 1:19 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: memtable mem usage off by 10?

   Unfortunately it looks like the heap utilisation of memtables was not
 exposed in earlier versions, because they only maintained an estimate.

 The overhead scales linearly with the amount of data in your memtables
 (assuming the size of each cell is approx. constant).

 flush_largest_memtables_at is an independent setting to
 memtable_total_space_in_mb, and generally has little effect. Ordinarily
 sstable flushes are triggered by hitting the memtable_total_space_in_mb
 limit. I'm afraid I don't follow where your 3x comes from?


 On 4 June 2014 12:04, Idrén, Johan johan.id...@dice.se wrote:

  Aha, ok. Thanks.



 Trying to understand what my cluster is doing:



 cassandra.db.memtable_data_size only gets me the actual data but not
 the memtable heap memory usage. Is there a way to check for heap memory
 usage?


  I would expect to hit the flush_largest_memtables_at value, and this
 would be what causes the memtable flush to sstable then? By default 0.75?


  Then I would expect the amount of memory to be used to be maximum ~3x
 of what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by
 default, max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).


 This is of course