Re: cassandra OOM

2017-04-26 Thread Jean Carlo
Hello @Durity

Would you mind to share information about your cluster? Actually I am
interested to know which version of cassandra you use. And how much time do
the gc pauses spend.


Thank you very much


Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Tue, Apr 25, 2017 at 7:47 PM, Durity, Sean R <sean_r_dur...@homedepot.com
> wrote:

> We have seen much better stability (and MUCH less GC pauses) from G1 with
> a variety of heap sizes. I don’t even consider CMS any more.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Gopal, Dhruva [mailto:dhruva.go...@aspect.com]
> *Sent:* Tuesday, April 04, 2017 5:34 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: cassandra OOM
>
>
>
> Thanks, that’s interesting – so CMS is a better option for
> stability/performance? We’ll try this out in our cluster.
>
>
>
> *From: *Alexander Dejanovski <a...@thelastpickle.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, April 3, 2017 at 10:31 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: cassandra OOM
>
>
>
> Hi,
>
>
>
> we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB
> heap when the workload is intense, and given you're running on m4.2xl I
> wouldn't go over 16GB for the heap.
>
>
>
> I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new
> gen. You can use 5 as MaxTenuringThreshold as an initial value and activate
> GC logging to fine tune the settings afterwards.
>
>
>
> FYI CMS tends to perform better than G1 even though it's a little bit
> harder to tune.
>
>
>
> Cheers,
>
>
>
> On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com>
> wrote:
>
> 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using
> m2.2xlarge instances in AWS):
>
>
>
>
>
> #
>
> # HEAP SETTINGS #
>
> #
>
>
>
> # Heap size is automatically calculated by cassandra-env based on this
>
> # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
>
> # That is:
>
> # - calculate 1/2 ram and cap to 1024MB
>
> # - calculate 1/4 ram and cap to 8192MB
>
> # - pick the max
>
> #
>
> # For production use you may wish to adjust this for your environment.
>
> # If that's the case, uncomment the -Xmx and Xms options below to override
> the
>
> # automatic calculation of JVM heap memory.
>
> #
>
> # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
>
> # the same value to avoid stop-the-world GC pauses during resize, and
>
> # so that we can lock the heap in memory on startup to prevent any
>
> # of it from being swapped out.
>
> -Xms16G
>
> -Xmx16G
>
>
>
> # Young generation size is automatically calculated by cassandra-env
>
> # based on this formula: min(100 * num_cores, 1/4 * heap size)
>
> #
>
> # The main trade-off for the young generation is that the larger it
>
> # is, the longer GC pause times will be. The shorter it is, the more
>
> # expensive GC will be (usually).
>
> #
>
> # It is not recommended to set the young generation size if using the
>
> # G1 GC, since that will override the target pause-time goal.
>
> # More info: http://www.oracle.com/technetwork/articles/java/
> g1gc-1984535.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork_articles_java_g1gc-2D1984535.html=DwMGaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk=xuMqARzoTSasEmAPkP7fVOcPZS050fy1N2_Ac5poOtA=>
>
> #
>
> # The example below assumes a modern 8-core+ machine for decent
>
> # times. If in doubt, and if you do not particularly want to tweak, go
>
> # 100 MB per physical CPU core.
>
> #-Xmn800M
>
>
>
> #
>
> #  GC SETTINGS  #
>
> #
>
>
>
> ### CMS Settings
>
>
>
> #-XX:+UseParNewGC
>
> #-XX:+UseConcMarkSweepGC
>
> #-XX:+CMSParallelRemarkEnabled
>
> #-XX:SurvivorRatio=8
>
> #-XX:MaxTenuringThreshold=1
>
> #-XX:CMSInitiatingOccupancyFraction=75
>
> #-XX:+UseCMSInitiatingOccupancyOnly
>
> #-XX:CMSWaitDuration=1
>
> #-XX:+CMSParallelInitialMarkEnabled
>
> #-XX:+CMSEdenChunksRecordAlways
>
> # some JVMs will fill up their heap when accessed via JMX, see
> CASSANDRA-6541
>
> #-XX:+CMSClassUnloadingEnabled
>
>
>
> ### G1 Settings (experimental, comment previous section and uncomment
> section below to enable)

Re: cassandra OOM

2017-04-25 Thread Carlos Rolo
To add some contribution to this thread, we have seen both cases. CMS
easily outperforming G1 for the same Heapsize and the inverse too. On the
same cluster different workloads (datacenter based) we have both collectors
because of performance based on the workload.

It would be good to colect this information out and do a talk/blog, but for
a later time.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +351 918 918 100
www.pythian.com

On Tue, Apr 25, 2017 at 6:47 PM, Durity, Sean R <sean_r_dur...@homedepot.com
> wrote:

> We have seen much better stability (and MUCH less GC pauses) from G1 with
> a variety of heap sizes. I don’t even consider CMS any more.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Gopal, Dhruva [mailto:dhruva.go...@aspect.com]
> *Sent:* Tuesday, April 04, 2017 5:34 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: cassandra OOM
>
>
>
> Thanks, that’s interesting – so CMS is a better option for
> stability/performance? We’ll try this out in our cluster.
>
>
>
> *From: *Alexander Dejanovski <a...@thelastpickle.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, April 3, 2017 at 10:31 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: cassandra OOM
>
>
>
> Hi,
>
>
>
> we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB
> heap when the workload is intense, and given you're running on m4.2xl I
> wouldn't go over 16GB for the heap.
>
>
>
> I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new
> gen. You can use 5 as MaxTenuringThreshold as an initial value and activate
> GC logging to fine tune the settings afterwards.
>
>
>
> FYI CMS tends to perform better than G1 even though it's a little bit
> harder to tune.
>
>
>
> Cheers,
>
>
>
> On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com>
> wrote:
>
> 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using
> m2.2xlarge instances in AWS):
>
>
>
>
>
> #
>
> # HEAP SETTINGS #
>
> #
>
>
>
> # Heap size is automatically calculated by cassandra-env based on this
>
> # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
>
> # That is:
>
> # - calculate 1/2 ram and cap to 1024MB
>
> # - calculate 1/4 ram and cap to 8192MB
>
> # - pick the max
>
> #
>
> # For production use you may wish to adjust this for your environment.
>
> # If that's the case, uncomment the -Xmx and Xms options below to override
> the
>
> # automatic calculation of JVM heap memory.
>
> #
>
> # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
>
> # the same value to avoid stop-the-world GC pauses during resize, and
>
> # so that we can lock the heap in memory on startup to prevent any
>
> # of it from being swapped out.
>
> -Xms16G
>
> -Xmx16G
>
>
>
> # Young generation size is automatically calculated by cassandra-env
>
> # based on this formula: min(100 * num_cores, 1/4 * heap size)
>
> #
>
> # The main trade-off for the young generation is that the larger it
>
> # is, the longer GC pause times will be. The shorter it is, the more
>
> # expensive GC will be (usually).
>
> #
>
> # It is not recommended to set the young generation size if using the
>
> # G1 GC, since that will override the target pause-time goal.
>
> # More info: http://www.oracle.com/technetwork/articles/java/
> g1gc-1984535.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork_articles_java_g1gc-2D1984535.html=DwMGaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk=xuMqARzoTSasEmAPkP7fVOcPZS050fy1N2_Ac5poOtA=>
>
> #
>
> # The example below assumes a modern 8-core+ machine for decent
>
> # times. If in doubt, and if you do not particularly want to tweak, go
>
> # 100 MB per physical CPU core.
>
> #-Xmn800M
>
>
>
> #
>
> #  GC SETTINGS  #
>
> #
>
>
>
> ### CMS Settings
>
>
>
> #-XX:+UseParNewGC
>
> #-XX:+UseConcMarkSweepGC
>
> #-XX:+CMSParallelRemarkEnabled
>
> #-XX:SurvivorRatio=8
>
> #-XX:MaxTenuringThreshold=1
>
> #-XX:CMSInitiatingOccupancyFraction=75
>
> #-XX:+UseCMSInitiatingOccupancyOnl

RE: cassandra OOM

2017-04-25 Thread Durity, Sean R
We have seen much better stability (and MUCH less GC pauses) from G1 with a 
variety of heap sizes. I don’t even consider CMS any more.


Sean Durity

From: Gopal, Dhruva [mailto:dhruva.go...@aspect.com]
Sent: Tuesday, April 04, 2017 5:34 PM
To: user@cassandra.apache.org
Subject: Re: cassandra OOM

Thanks, that’s interesting – so CMS is a better option for 
stability/performance? We’ll try this out in our cluster.

From: Alexander Dejanovski 
<a...@thelastpickle.com<mailto:a...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, April 3, 2017 at 10:31 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: cassandra OOM

Hi,

we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB heap 
when the workload is intense, and given you're running on m4.2xl I wouldn't go 
over 16GB for the heap.

I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new gen. 
You can use 5 as MaxTenuringThreshold as an initial value and activate GC 
logging to fine tune the settings afterwards.

FYI CMS tends to perform better than G1 even though it's a little bit harder to 
tune.

Cheers,

On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva 
<dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote:
16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using 
m2.2xlarge instances in AWS):


#
# HEAP SETTINGS #
#

# Heap size is automatically calculated by cassandra-env based on this
# formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
# That is:
# - calculate 1/2 ram and cap to 1024MB
# - calculate 1/4 ram and cap to 8192MB
# - pick the max
#
# For production use you may wish to adjust this for your environment.
# If that's the case, uncomment the -Xmx and Xms options below to override the
# automatic calculation of JVM heap memory.
#
# It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
# the same value to avoid stop-the-world GC pauses during resize, and
# so that we can lock the heap in memory on startup to prevent any
# of it from being swapped out.
-Xms16G
-Xmx16G

# Young generation size is automatically calculated by cassandra-env
# based on this formula: min(100 * num_cores, 1/4 * heap size)
#
# The main trade-off for the young generation is that the larger it
# is, the longer GC pause times will be. The shorter it is, the more
# expensive GC will be (usually).
#
# It is not recommended to set the young generation size if using the
# G1 GC, since that will override the target pause-time goal.
# More info: 
http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork_articles_java_g1gc-2D1984535.html=DwMGaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk=xuMqARzoTSasEmAPkP7fVOcPZS050fy1N2_Ac5poOtA=>
#
# The example below assumes a modern 8-core+ machine for decent
# times. If in doubt, and if you do not particularly want to tweak, go
# 100 MB per physical CPU core.
#-Xmn800M

#
#  GC SETTINGS  #
#

### CMS Settings

#-XX:+UseParNewGC
#-XX:+UseConcMarkSweepGC
#-XX:+CMSParallelRemarkEnabled
#-XX:SurvivorRatio=8
#-XX:MaxTenuringThreshold=1
#-XX:CMSInitiatingOccupancyFraction=75
#-XX:+UseCMSInitiatingOccupancyOnly
#-XX:CMSWaitDuration=1
#-XX:+CMSParallelInitialMarkEnabled
#-XX:+CMSEdenChunksRecordAlways
# some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
#-XX:+CMSClassUnloadingEnabled

### G1 Settings (experimental, comment previous section and uncomment section 
below to enable)

## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
#
## Have the JVM do less remembered set work during STW, instead
## preferring concurrent GC. Reduces p99.9 latency.
-XX:G1RSetUpdatingPauseTimePercent=5
#
## Main G1GC tunable: lowering the pause target will lower throughput and vise 
versa.
## 200ms is the JVM default and lowest viable setting
## 1000ms increases throughput. Keep it smaller than the timeouts in 
cassandra.yaml.
-XX:MaxGCPauseMillis=500

## Optional G1 Settings

# Save CPU time on large (>= 16GB) heaps by delaying region scanning
# until the heap is 70% full. The default in Hotspot 8u40 is 40%.
-XX:InitiatingHeapOccupancyPercent=70

# For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number 
of logical cores.
# Otherwise equal to the number of cores when 8 or less.
# Machines with > 10 cores should try setting these to <= full cores.
#-XX:ParallelGCThreads=16
# By default, ConcGCThreads is 1/4 of ParallelGCThreads.
# Setting both to the same value can reduce STW durations.
#-XX:ConcGCThreads=16

### GC loggin

Re: cassandra OOM

2017-04-04 Thread Gopal, Dhruva
Thanks, that’s interesting – so CMS is a better option for 
stability/performance? We’ll try this out in our cluster.

From: Alexander Dejanovski <a...@thelastpickle.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, April 3, 2017 at 10:31 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: cassandra OOM

Hi,

we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB heap 
when the workload is intense, and given you're running on m4.2xl I wouldn't go 
over 16GB for the heap.

I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new gen. 
You can use 5 as MaxTenuringThreshold as an initial value and activate GC 
logging to fine tune the settings afterwards.

FYI CMS tends to perform better than G1 even though it's a little bit harder to 
tune.

Cheers,

On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva 
<dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote:
16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using 
m2.2xlarge instances in AWS):


#
# HEAP SETTINGS #
#

# Heap size is automatically calculated by cassandra-env based on this
# formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
# That is:
# - calculate 1/2 ram and cap to 1024MB
# - calculate 1/4 ram and cap to 8192MB
# - pick the max
#
# For production use you may wish to adjust this for your environment.
# If that's the case, uncomment the -Xmx and Xms options below to override the
# automatic calculation of JVM heap memory.
#
# It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
# the same value to avoid stop-the-world GC pauses during resize, and
# so that we can lock the heap in memory on startup to prevent any
# of it from being swapped out.
-Xms16G
-Xmx16G

# Young generation size is automatically calculated by cassandra-env
# based on this formula: min(100 * num_cores, 1/4 * heap size)
#
# The main trade-off for the young generation is that the larger it
# is, the longer GC pause times will be. The shorter it is, the more
# expensive GC will be (usually).
#
# It is not recommended to set the young generation size if using the
# G1 GC, since that will override the target pause-time goal.
# More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
#
# The example below assumes a modern 8-core+ machine for decent
# times. If in doubt, and if you do not particularly want to tweak, go
# 100 MB per physical CPU core.
#-Xmn800M

#
#  GC SETTINGS  #
#

### CMS Settings

#-XX:+UseParNewGC
#-XX:+UseConcMarkSweepGC
#-XX:+CMSParallelRemarkEnabled
#-XX:SurvivorRatio=8
#-XX:MaxTenuringThreshold=1
#-XX:CMSInitiatingOccupancyFraction=75
#-XX:+UseCMSInitiatingOccupancyOnly
#-XX:CMSWaitDuration=1
#-XX:+CMSParallelInitialMarkEnabled
#-XX:+CMSEdenChunksRecordAlways
# some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
#-XX:+CMSClassUnloadingEnabled

### G1 Settings (experimental, comment previous section and uncomment section 
below to enable)

## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
#
## Have the JVM do less remembered set work during STW, instead
## preferring concurrent GC. Reduces p99.9 latency.
-XX:G1RSetUpdatingPauseTimePercent=5
#
## Main G1GC tunable: lowering the pause target will lower throughput and vise 
versa.
## 200ms is the JVM default and lowest viable setting
## 1000ms increases throughput. Keep it smaller than the timeouts in 
cassandra.yaml.
-XX:MaxGCPauseMillis=500

## Optional G1 Settings

# Save CPU time on large (>= 16GB) heaps by delaying region scanning
# until the heap is 70% full. The default in Hotspot 8u40 is 40%.
-XX:InitiatingHeapOccupancyPercent=70

# For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number 
of logical cores.
# Otherwise equal to the number of cores when 8 or less.
# Machines with > 10 cores should try setting these to <= full cores.
#-XX:ParallelGCThreads=16
# By default, ConcGCThreads is 1/4 of ParallelGCThreads.
# Setting both to the same value can reduce STW durations.
#-XX:ConcGCThreads=16

### GC logging options -- uncomment to enable

#-XX:+PrintGCDetails
#-XX:+PrintGCDateStamps
#-XX:+PrintHeapAtGC
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
#-XX:+PrintPromotionFailure
#-XX:PrintFLSStatistics=1
#-Xloggc:/var/log/cassandra/gc.log
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=10
#-XX:GCLogFileSize=10M


From: Alexander Dejanovski 
<a...@thelastpickle.com<mailto:a...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, April 3, 2017 at 8:00 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org

Re: cassandra OOM

2017-04-03 Thread Alexander Dejanovski
Hi,

we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB
heap when the workload is intense, and given you're running on m4.2xl I
wouldn't go over 16GB for the heap.

I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new
gen. You can use 5 as MaxTenuringThreshold as an initial value and activate
GC logging to fine tune the settings afterwards.

FYI CMS tends to perform better than G1 even though it's a little bit
harder to tune.

Cheers,

On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com>
wrote:

> 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using
> m2.2xlarge instances in AWS):
>
>
>
>
>
> #
>
> # HEAP SETTINGS #
>
> #
>
>
>
> # Heap size is automatically calculated by cassandra-env based on this
>
> # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
>
> # That is:
>
> # - calculate 1/2 ram and cap to 1024MB
>
> # - calculate 1/4 ram and cap to 8192MB
>
> # - pick the max
>
> #
>
> # For production use you may wish to adjust this for your environment.
>
> # If that's the case, uncomment the -Xmx and Xms options below to override
> the
>
> # automatic calculation of JVM heap memory.
>
> #
>
> # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
>
> # the same value to avoid stop-the-world GC pauses during resize, and
>
> # so that we can lock the heap in memory on startup to prevent any
>
> # of it from being swapped out.
>
> -Xms16G
>
> -Xmx16G
>
>
>
> # Young generation size is automatically calculated by cassandra-env
>
> # based on this formula: min(100 * num_cores, 1/4 * heap size)
>
> #
>
> # The main trade-off for the young generation is that the larger it
>
> # is, the longer GC pause times will be. The shorter it is, the more
>
> # expensive GC will be (usually).
>
> #
>
> # It is not recommended to set the young generation size if using the
>
> # G1 GC, since that will override the target pause-time goal.
>
> # More info:
> http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
>
> #
>
> # The example below assumes a modern 8-core+ machine for decent
>
> # times. If in doubt, and if you do not particularly want to tweak, go
>
> # 100 MB per physical CPU core.
>
> #-Xmn800M
>
>
>
> #
>
> #  GC SETTINGS  #
>
> #
>
>
>
> ### CMS Settings
>
>
>
> #-XX:+UseParNewGC
>
> #-XX:+UseConcMarkSweepGC
>
> #-XX:+CMSParallelRemarkEnabled
>
> #-XX:SurvivorRatio=8
>
> #-XX:MaxTenuringThreshold=1
>
> #-XX:CMSInitiatingOccupancyFraction=75
>
> #-XX:+UseCMSInitiatingOccupancyOnly
>
> #-XX:CMSWaitDuration=1
>
> #-XX:+CMSParallelInitialMarkEnabled
>
> #-XX:+CMSEdenChunksRecordAlways
>
> # some JVMs will fill up their heap when accessed via JMX, see
> CASSANDRA-6541
>
> #-XX:+CMSClassUnloadingEnabled
>
>
>
> ### G1 Settings (experimental, comment previous section and uncomment
> section below to enable)
>
>
>
> ## Use the Hotspot garbage-first collector.
>
> -XX:+UseG1GC
>
> #
>
> ## Have the JVM do less remembered set work during STW, instead
>
> ## preferring concurrent GC. Reduces p99.9 latency.
>
> -XX:G1RSetUpdatingPauseTimePercent=5
>
> #
>
> ## Main G1GC tunable: lowering the pause target will lower throughput and
> vise versa.
>
> ## 200ms is the JVM default and lowest viable setting
>
> ## 1000ms increases throughput. Keep it smaller than the timeouts in
> cassandra.yaml.
>
> -XX:MaxGCPauseMillis=500
>
>
>
> ## Optional G1 Settings
>
>
>
> # Save CPU time on large (>= 16GB) heaps by delaying region scanning
>
> # until the heap is 70% full. The default in Hotspot 8u40 is 40%.
>
> -XX:InitiatingHeapOccupancyPercent=70
>
>
>
> # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the
> number of logical cores.
>
> # Otherwise equal to the number of cores when 8 or less.
>
> # Machines with > 10 cores should try setting these to <= full cores.
>
> #-XX:ParallelGCThreads=16
>
> # By default, ConcGCThreads is 1/4 of ParallelGCThreads.
>
> # Setting both to the same value can reduce STW durations.
>
> #-XX:ConcGCThreads=16
>
>
>
> ### GC logging options -- uncomment to enable
>
>
>
> #-XX:+PrintGCDetails
>
> #-XX:+PrintGCDateStamps
>
> #-XX:+PrintHeapAtGC
>
> #-XX:+PrintTenuringDistribution
>
> #-XX:+PrintGCApplicationStoppedTime
>
> #-XX:+PrintPromotionFailure
>
> #-XX:PrintFLSStatistics=1
>
> #-Xloggc

Re: cassandra OOM

2017-04-03 Thread Gopal, Dhruva
16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using 
m2.2xlarge instances in AWS):


#
# HEAP SETTINGS #
#

# Heap size is automatically calculated by cassandra-env based on this
# formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
# That is:
# - calculate 1/2 ram and cap to 1024MB
# - calculate 1/4 ram and cap to 8192MB
# - pick the max
#
# For production use you may wish to adjust this for your environment.
# If that's the case, uncomment the -Xmx and Xms options below to override the
# automatic calculation of JVM heap memory.
#
# It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
# the same value to avoid stop-the-world GC pauses during resize, and
# so that we can lock the heap in memory on startup to prevent any
# of it from being swapped out.
-Xms16G
-Xmx16G

# Young generation size is automatically calculated by cassandra-env
# based on this formula: min(100 * num_cores, 1/4 * heap size)
#
# The main trade-off for the young generation is that the larger it
# is, the longer GC pause times will be. The shorter it is, the more
# expensive GC will be (usually).
#
# It is not recommended to set the young generation size if using the
# G1 GC, since that will override the target pause-time goal.
# More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
#
# The example below assumes a modern 8-core+ machine for decent
# times. If in doubt, and if you do not particularly want to tweak, go
# 100 MB per physical CPU core.
#-Xmn800M

#
#  GC SETTINGS  #
#

### CMS Settings

#-XX:+UseParNewGC
#-XX:+UseConcMarkSweepGC
#-XX:+CMSParallelRemarkEnabled
#-XX:SurvivorRatio=8
#-XX:MaxTenuringThreshold=1
#-XX:CMSInitiatingOccupancyFraction=75
#-XX:+UseCMSInitiatingOccupancyOnly
#-XX:CMSWaitDuration=1
#-XX:+CMSParallelInitialMarkEnabled
#-XX:+CMSEdenChunksRecordAlways
# some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
#-XX:+CMSClassUnloadingEnabled

### G1 Settings (experimental, comment previous section and uncomment section 
below to enable)

## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
#
## Have the JVM do less remembered set work during STW, instead
## preferring concurrent GC. Reduces p99.9 latency.
-XX:G1RSetUpdatingPauseTimePercent=5
#
## Main G1GC tunable: lowering the pause target will lower throughput and vise 
versa.
## 200ms is the JVM default and lowest viable setting
## 1000ms increases throughput. Keep it smaller than the timeouts in 
cassandra.yaml.
-XX:MaxGCPauseMillis=500

## Optional G1 Settings

# Save CPU time on large (>= 16GB) heaps by delaying region scanning
# until the heap is 70% full. The default in Hotspot 8u40 is 40%.
-XX:InitiatingHeapOccupancyPercent=70

# For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number 
of logical cores.
# Otherwise equal to the number of cores when 8 or less.
# Machines with > 10 cores should try setting these to <= full cores.
#-XX:ParallelGCThreads=16
# By default, ConcGCThreads is 1/4 of ParallelGCThreads.
# Setting both to the same value can reduce STW durations.
#-XX:ConcGCThreads=16

### GC logging options -- uncomment to enable

#-XX:+PrintGCDetails
#-XX:+PrintGCDateStamps
#-XX:+PrintHeapAtGC
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
#-XX:+PrintPromotionFailure
#-XX:PrintFLSStatistics=1
#-Xloggc:/var/log/cassandra/gc.log
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=10
#-XX:GCLogFileSize=10M


From: Alexander Dejanovski <a...@thelastpickle.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, April 3, 2017 at 8:00 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: cassandra OOM

Hi,

could you share your GC settings ? G1 or CMS ? Heap size, etc...

Thanks,

On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruva 
<dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote:
Hi –
  We’ve had what looks like an OOM situation with Cassandra (we have a dump 
file that got generated) in our staging (performance/load testing environment) 
and I wanted to reach out to this user group to see if you had any 
recommendations on how we should approach our investigation as to the cause of 
this issue. The logs don’t seem to point to any obvious issues, and we’re no 
experts in analyzing this by any means, so was looking for guidance on how to 
proceed. Should we enter a Jira as well? We’re on Cassandra 3.9, and are 
running  a six node cluster. This happened in a controlled load testing 
environment. Feedback will be much appreciated!


Regards,
Dhruva

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and d

Re: cassandra OOM

2017-04-03 Thread Alexander Dejanovski
Hi,

could you share your GC settings ? G1 or CMS ? Heap size, etc...

Thanks,

On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruva 
wrote:

> Hi –
>
>   We’ve had what looks like an OOM situation with Cassandra (we have a
> dump file that got generated) in our staging (performance/load testing
> environment) and I wanted to reach out to this user group to see if you had
> any recommendations on how we should approach our investigation as to the
> cause of this issue. The logs don’t seem to point to any obvious issues,
> and we’re no experts in analyzing this by any means, so was looking for
> guidance on how to proceed. Should we enter a Jira as well? We’re on
> Cassandra 3.9, and are running  a six node cluster. This happened in a
> controlled load testing environment. Feedback will be much appreciated!
>
>
>
>
>
> Regards,
>
> Dhruva
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Cassandra OOM on joining existing ring

2015-07-13 Thread Sebastian Estevez
Are you on the azure premium storage?
http://www.datastax.com/2015/04/getting-started-with-azure-premium-storage-and-datastax-enterprise-dse

Secondary indexes are built for convenience not performance.
http://www.datastax.com/resources/data-modeling

What's your compaction strategy? Your nodes have to come up in order for
them to start compacting.
On Jul 13, 2015 1:11 AM, Kunal Gangakhedkar kgangakhed...@gmail.com
wrote:

 Hi,

 Looks like that is my primary problem - the sstable count for the
 daily_challenges column family is 5k. Azure had scheduled maintenance
 window on Sat. All the VMs got rebooted one by one - including the current
 cassandra one - and it's taking forever to bring cassandra back up online.

 Is there any way I can re-organize my existing data? so that I can bring
 down that count?
 I don't want to lose that data.
 If possible, can I do that while cassandra is down? As I mentioned, it's
 taking forever to get the service up - it's stuck in reading those 5k
 sstable (+ another 5k of corresponding secondary index) files. :(
 Oh, did I mention I'm new to cassandra?

 Thanks,
 Kunal

 Kunal

 On 11 July 2015 at 03:29, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 #1

 There is one table - daily_challenges - which shows compacted partition
 max bytes as ~460M and another one - daily_guest_logins - which shows
 compacted partition max bytes as ~36M.


 460 is high, I like to keep my partitions under 100mb when possible. I've
 seen worse though. The fix is to add something else (maybe month or week or
 something) into your partition key:

  PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id)

 #2 looks like your jam version is 3 per your env.sh so you're probably
 okay to copy the env.sh over from the C* 3.0 link I shared once you
 uncomment and tweak the MAX_HEAP. If there's something wrong your node
 won't come up. tail your logs.



 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 And here is my cassandra-env.sh
 https://gist.github.com/kunalg/2c092cb2450c62be9a20

 Kunal

 On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 From jhat output, top 10 entries for Instance Count for All Classes
 (excluding platform) shows:

 2088223 instances of class org.apache.cassandra.db.BufferCell
 1983245 instances of class
 org.apache.cassandra.db.composites.CompoundSparseCellName
 1885974 instances of class
 org.apache.cassandra.db.composites.CompoundDenseCellName
 63 instances of class
 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
 503687 instances of class org.apache.cassandra.db.BufferDeletedCell
 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
 101800 instances of class org.apache.cassandra.utils.concurrent.Ref
 101800 instances of class
 org.apache.cassandra.utils.concurrent.Ref$State
 90704 instances of class
 org.apache.cassandra.utils.concurrent.Ref$GlobalState
 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey

 At the bottom of the page, it shows:
 Total of 8739510 instances occupying 193607512 bytes.
 JFYI.

 Kunal

 On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Thanks for quick reply.

 1. I don't know what are the thresholds that I should look for. So, to
 save this back-and-forth, I'm attaching the cfstats output for the 
 keyspace.

 There is one table - daily_challenges - which shows compacted
 partition max bytes as ~460M and another one - daily_guest_logins - which
 shows compacted partition max bytes as ~36M.

 Can that be a problem?
 Here is the CQL schema for the daily_challenges column family:

 CREATE TABLE app_10001.daily_challenges (
 segment_type text,
 date timestamp,
 user_id int,
 sess_id text,
 data text,
 deleted boolean,
 PRIMARY KEY (segment_type, date, user_id, sess_id)
 ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = 

Re: Cassandra OOM on joining existing ring

2015-07-13 Thread Anuj Wadehra
We faced similar issue where we had 60k sstables due to coldness bug in 2.0.3. 
We solved it by following Datastax recommendation for Production at 
http://docs.datastax.com/en/cassandra/1.2/cassandra/install/installRecommendSettings.html
 :


Step 1 : Add the following line to /etc/sysctl.conf :

 

vm.max_map_count = 131072

 

Step 2: To make the changes take effect, reboot the server or run the following 
command:

 

$ sudo sysctl -p

 

Step 3(optional): To confirm the limits are applied to the Cassandra process, 
run the following command where pid is the process ID of the currently running 
Cassandra process:

 

$ cat /proc/pid/limits



You can try above settings and share your results..


Thanks

Anuj

Sent from Yahoo Mail on Android

From:Sebastian Estevez sebastian.este...@datastax.com
Date:Mon, 13 Jul, 2015 at 7:02 pm
Subject:Re: Cassandra OOM on joining existing ring

Are you on the azure premium storage?
http://www.datastax.com/2015/04/getting-started-with-azure-premium-storage-and-datastax-enterprise-dse

Secondary indexes are built for convenience not performance.
http://www.datastax.com/resources/data-modeling

What's your compaction strategy? Your nodes have to come up in order for them 
to start compacting. 

On Jul 13, 2015 1:11 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote:

Hi,


Looks like that is my primary problem - the sstable count for the 
daily_challenges column family is 5k. Azure had scheduled maintenance window 
on Sat. All the VMs got rebooted one by one - including the current cassandra 
one - and it's taking forever to bring cassandra back up online.


Is there any way I can re-organize my existing data? so that I can bring down 
that count?

I don't want to lose that data.

If possible, can I do that while cassandra is down? As I mentioned, it's taking 
forever to get the service up - it's stuck in reading those 5k sstable (+ 
another 5k of corresponding secondary index) files. :(

Oh, did I mention I'm new to cassandra?


Thanks,

Kunal


Kunal


On 11 July 2015 at 03:29, Sebastian Estevez sebastian.este...@datastax.com 
wrote:

#1 

There is one table - daily_challenges - which shows compacted partition max 
bytes as ~460M and another one - daily_guest_logins - which shows compacted 
partition max bytes as ~36M.


460 is high, I like to keep my partitions under 100mb when possible. I've seen 
worse though. The fix is to add something else (maybe month or week or 
something) into your partition key:


 PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id)


#2 looks like your jam version is 3 per your env.sh so you're probably okay to 
copy the env.sh over from the C* 3.0 link I shared once you uncomment and tweak 
the MAX_HEAP. If there's something wrong your node won't come up. tail your 
logs.




All the best,




Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

    





DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay. 


On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar kgangakhed...@gmail.com 
wrote:

And here is my cassandra-env.sh

https://gist.github.com/kunalg/2c092cb2450c62be9a20


Kunal


On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com wrote:

From jhat output, top 10 entries for Instance Count for All Classes (excluding 
platform) shows:

2088223 instances of class org.apache.cassandra.db.BufferCell 
1983245 instances of class 
org.apache.cassandra.db.composites.CompoundSparseCellName 
1885974 instances of class 
org.apache.cassandra.db.composites.CompoundDenseCellName 
63 instances of class org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 
503687 instances of class org.apache.cassandra.db.BufferDeletedCell 
378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier 
101800 instances of class org.apache.cassandra.utils.concurrent.Ref 
101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State 

90704 instances of class org.apache.cassandra.utils.concurrent.Ref$GlobalState 
71123 instances of class org.apache.cassandra.db.BufferDecoratedKey 


At the bottom of the page, it shows: 

Total of 8739510 instances occupying 193607512 bytes.

JFYI.


Kunal


On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com wrote:

Thanks for quick reply.

1. I don't know what are the thresholds that I should look for. So, to save 
this back-and-forth, I'm attaching the cfstats output for the keyspace.

There is one table - daily_challenges - which shows compacted partition max 
bytes as ~460M and another one - daily_guest_logins - which shows compacted 

Re: Cassandra OOM on joining existing ring

2015-07-12 Thread Kunal Gangakhedkar
Hi,

Looks like that is my primary problem - the sstable count for the
daily_challenges column family is 5k. Azure had scheduled maintenance
window on Sat. All the VMs got rebooted one by one - including the current
cassandra one - and it's taking forever to bring cassandra back up online.

Is there any way I can re-organize my existing data? so that I can bring
down that count?
I don't want to lose that data.
If possible, can I do that while cassandra is down? As I mentioned, it's
taking forever to get the service up - it's stuck in reading those 5k
sstable (+ another 5k of corresponding secondary index) files. :(
Oh, did I mention I'm new to cassandra?

Thanks,
Kunal

Kunal

On 11 July 2015 at 03:29, Sebastian Estevez sebastian.este...@datastax.com
wrote:

 #1

 There is one table - daily_challenges - which shows compacted partition
 max bytes as ~460M and another one - daily_guest_logins - which shows
 compacted partition max bytes as ~36M.


 460 is high, I like to keep my partitions under 100mb when possible. I've
 seen worse though. The fix is to add something else (maybe month or week or
 something) into your partition key:

  PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id)

 #2 looks like your jam version is 3 per your env.sh so you're probably
 okay to copy the env.sh over from the C* 3.0 link I shared once you
 uncomment and tweak the MAX_HEAP. If there's something wrong your node
 won't come up. tail your logs.



 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 And here is my cassandra-env.sh
 https://gist.github.com/kunalg/2c092cb2450c62be9a20

 Kunal

 On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 From jhat output, top 10 entries for Instance Count for All Classes
 (excluding platform) shows:

 2088223 instances of class org.apache.cassandra.db.BufferCell
 1983245 instances of class
 org.apache.cassandra.db.composites.CompoundSparseCellName
 1885974 instances of class
 org.apache.cassandra.db.composites.CompoundDenseCellName
 63 instances of class
 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
 503687 instances of class org.apache.cassandra.db.BufferDeletedCell
 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
 101800 instances of class org.apache.cassandra.utils.concurrent.Ref
 101800 instances of class
 org.apache.cassandra.utils.concurrent.Ref$State
 90704 instances of class
 org.apache.cassandra.utils.concurrent.Ref$GlobalState
 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey

 At the bottom of the page, it shows:
 Total of 8739510 instances occupying 193607512 bytes.
 JFYI.

 Kunal

 On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Thanks for quick reply.

 1. I don't know what are the thresholds that I should look for. So, to
 save this back-and-forth, I'm attaching the cfstats output for the 
 keyspace.

 There is one table - daily_challenges - which shows compacted partition
 max bytes as ~460M and another one - daily_guest_logins - which shows
 compacted partition max bytes as ~36M.

 Can that be a problem?
 Here is the CQL schema for the daily_challenges column family:

 CREATE TABLE app_10001.daily_challenges (
 segment_type text,
 date timestamp,
 user_id int,
 sess_id text,
 data text,
 deleted boolean,
 PRIMARY KEY (segment_type, date, user_id, sess_id)
 ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class':
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
 'max_threshold': '32'}
 AND compression = {'sstable_compression':
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 

Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Kunal Gangakhedkar
Attaching the stack dump captured from the last OOM.

Kunal

On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com
wrote:

 Forgot to mention: the data size is not that big - it's barely 10GB in all.

 Kunal

 On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Hi,

 I have a 2 node setup on Azure (east us region) running Ubuntu server
 14.04LTS.
 Both nodes have 8GB RAM.

 One of the nodes (seed node) died with OOM - so, I am trying to add a
 replacement node with same configuration.

 The problem is this new node also keeps dying with OOM - I've restarted
 the cassandra service like 8-10 times hoping that it would finish the
 replication. But it didn't help.

 The one node that is still up is happily chugging along.
 All nodes have similar configuration - with libjna installed.

 Cassandra is installed from datastax's debian repo - pkg: dsc21 version
 2.1.7.
 I started off with the default configuration - i.e. the default
 cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM
 = 2GB)

 But, that didn't help. So, I then tried to increase the heap to 4GB
 manually and restarted. It still keeps crashing.

 Any clue as to why it's happening?

 Thanks,
 Kunal



ERROR [SharedPool-Worker-6] 2015-07-10 05:12:16,862 
JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) ~[na:1.8.0_45]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_45]
at 
org.apache.cassandra.utils.memory.SlabAllocator.getRegion(SlabAllocator.java:137)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.utils.memory.SlabAllocator.allocate(SlabAllocator.java:97) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.utils.memory.ContextAllocator.allocate(ContextAllocator.java:57)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.utils.memory.ContextAllocator.clone(ContextAllocator.java:47)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.utils.memory.MemtableBufferAllocator.clone(MemtableBufferAllocator.java:61)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.db.Memtable.put(Memtable.java:192) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1212) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:131)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.db.index.SecondaryIndexManager$StandardUpdater.insert(SecondaryIndexManager.java:791)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:444)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:418)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.utils.btree.BTree.build(BTree.java:116) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.utils.btree.BTree.update(BTree.java:177) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.db.Memtable.put(Memtable.java:210) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1212) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:389) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:352) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
~[apache-cassandra-2.1.7.jar:2.1.7]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_45]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
 ~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[apache-cassandra-2.1.7.jar:2.1.7]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
ERROR [CompactionExecutor:3] 2015-07-10 05:12:16,862 CassandraDaemon.java:223 - 
Exception in thread Thread[CompactionExecutor:3,1,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayDeque.doubleCapacity(ArrayDeque.java:157) 
~[na:1.8.0_45]
at 

Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Kunal Gangakhedkar
Forgot to mention: the data size is not that big - it's barely 10GB in all.

Kunal

On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com
wrote:

 Hi,

 I have a 2 node setup on Azure (east us region) running Ubuntu server
 14.04LTS.
 Both nodes have 8GB RAM.

 One of the nodes (seed node) died with OOM - so, I am trying to add a
 replacement node with same configuration.

 The problem is this new node also keeps dying with OOM - I've restarted
 the cassandra service like 8-10 times hoping that it would finish the
 replication. But it didn't help.

 The one node that is still up is happily chugging along.
 All nodes have similar configuration - with libjna installed.

 Cassandra is installed from datastax's debian repo - pkg: dsc21 version
 2.1.7.
 I started off with the default configuration - i.e. the default
 cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM
 = 2GB)

 But, that didn't help. So, I then tried to increase the heap to 4GB
 manually and restarted. It still keeps crashing.

 Any clue as to why it's happening?

 Thanks,
 Kunal



Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Kunal Gangakhedkar
I'm new to cassandra
How do I find those out? - mainly, the partition params that you asked for.
Others, I think I can figure out.

We don't have any large objects/blobs in the column values - it's all
textual, date-time, numeric and uuid data.

We use cassandra to primarily store segmentation data - with segment type
as partition key. That is again divided into two separate column families;
but they have similar structure.

Columns per row can be fairly large - each segment type as the row key and
associated user ids and timestamp as column value.

Thanks,
Kunal

On 10 July 2015 at 16:36, Jack Krupansky jack.krupan...@gmail.com wrote:

 What does your data and data model look like - partition size, rows per
 partition, number of columns per row, any large values/blobs in column
 values?

 You could run fine on an 8GB system, but only if your rows and partitions
 are reasonably small. Any large partitions could blow you away.

 -- Jack Krupansky

 On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 Attaching the stack dump captured from the last OOM.

 Kunal

 On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Forgot to mention: the data size is not that big - it's barely 10GB in
 all.

 Kunal

 On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Hi,

 I have a 2 node setup on Azure (east us region) running Ubuntu server
 14.04LTS.
 Both nodes have 8GB RAM.

 One of the nodes (seed node) died with OOM - so, I am trying to add a
 replacement node with same configuration.

 The problem is this new node also keeps dying with OOM - I've restarted
 the cassandra service like 8-10 times hoping that it would finish the
 replication. But it didn't help.

 The one node that is still up is happily chugging along.
 All nodes have similar configuration - with libjna installed.

 Cassandra is installed from datastax's debian repo - pkg: dsc21 version
 2.1.7.
 I started off with the default configuration - i.e. the default
 cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM
 = 2GB)

 But, that didn't help. So, I then tried to increase the heap to 4GB
 manually and restarted. It still keeps crashing.

 Any clue as to why it's happening?

 Thanks,
 Kunal







Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Jack Krupansky
You, and only you, are responsible for knowing your data and data model.

If columns per row or rows per partition can be large, then an 8GB system
is probably too small. But the real issue is that you need to keep your
partition size from getting too large.

Generally, an 8GB system is okay, but only for reasonably-sized partitions,
like under 10MB.


-- Jack Krupansky

On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 I'm new to cassandra
 How do I find those out? - mainly, the partition params that you asked
 for. Others, I think I can figure out.

 We don't have any large objects/blobs in the column values - it's all
 textual, date-time, numeric and uuid data.

 We use cassandra to primarily store segmentation data - with segment type
 as partition key. That is again divided into two separate column families;
 but they have similar structure.

 Columns per row can be fairly large - each segment type as the row key and
 associated user ids and timestamp as column value.

 Thanks,
 Kunal

 On 10 July 2015 at 16:36, Jack Krupansky jack.krupan...@gmail.com wrote:

 What does your data and data model look like - partition size, rows per
 partition, number of columns per row, any large values/blobs in column
 values?

 You could run fine on an 8GB system, but only if your rows and partitions
 are reasonably small. Any large partitions could blow you away.

 -- Jack Krupansky

 On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 Attaching the stack dump captured from the last OOM.

 Kunal

 On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Forgot to mention: the data size is not that big - it's barely 10GB in
 all.

 Kunal

 On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Hi,

 I have a 2 node setup on Azure (east us region) running Ubuntu server
 14.04LTS.
 Both nodes have 8GB RAM.

 One of the nodes (seed node) died with OOM - so, I am trying to add a
 replacement node with same configuration.

 The problem is this new node also keeps dying with OOM - I've
 restarted the cassandra service like 8-10 times hoping that it would 
 finish
 the replication. But it didn't help.

 The one node that is still up is happily chugging along.
 All nodes have similar configuration - with libjna installed.

 Cassandra is installed from datastax's debian repo - pkg: dsc21
 version 2.1.7.
 I started off with the default configuration - i.e. the default
 cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM
 = 2GB)

 But, that didn't help. So, I then tried to increase the heap to 4GB
 manually and restarted. It still keeps crashing.

 Any clue as to why it's happening?

 Thanks,
 Kunal








Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Jack Krupansky
What does your data and data model look like - partition size, rows per
partition, number of columns per row, any large values/blobs in column
values?

You could run fine on an 8GB system, but only if your rows and partitions
are reasonably small. Any large partitions could blow you away.

-- Jack Krupansky

On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Attaching the stack dump captured from the last OOM.

 Kunal

 On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Forgot to mention: the data size is not that big - it's barely 10GB in
 all.

 Kunal

 On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Hi,

 I have a 2 node setup on Azure (east us region) running Ubuntu server
 14.04LTS.
 Both nodes have 8GB RAM.

 One of the nodes (seed node) died with OOM - so, I am trying to add a
 replacement node with same configuration.

 The problem is this new node also keeps dying with OOM - I've restarted
 the cassandra service like 8-10 times hoping that it would finish the
 replication. But it didn't help.

 The one node that is still up is happily chugging along.
 All nodes have similar configuration - with libjna installed.

 Cassandra is installed from datastax's debian repo - pkg: dsc21 version
 2.1.7.
 I started off with the default configuration - i.e. the default
 cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM
 = 2GB)

 But, that didn't help. So, I then tried to increase the heap to 4GB
 manually and restarted. It still keeps crashing.

 Any clue as to why it's happening?

 Thanks,
 Kunal






Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Kunal Gangakhedkar
Thanks for quick reply.

1. I don't know what are the thresholds that I should look for. So, to save
this back-and-forth, I'm attaching the cfstats output for the keyspace.

There is one table - daily_challenges - which shows compacted partition max
bytes as ~460M and another one - daily_guest_logins - which shows compacted
partition max bytes as ~36M.

Can that be a problem?
Here is the CQL schema for the daily_challenges column family:

CREATE TABLE app_10001.daily_challenges (
segment_type text,
date timestamp,
user_id int,
sess_id text,
data text,
deleted boolean,
PRIMARY KEY (segment_type, date, user_id, sess_id)
) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{keys:ALL, rows_per_partition:NONE}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);


2. I don't know - how do I check? As I mentioned, I just installed the
dsc21 update from datastax's debian repo (ver 2.1.7).

Really appreciate your help.

Thanks,
Kunal

On 10 July 2015 at 23:33, Sebastian Estevez sebastian.este...@datastax.com
wrote:

 1. You want to look at # of sstables in cfhistograms or in cfstats look at:
 Compacted partition maximum bytes
 Maximum live cells per slice

 2) No, here's the env.sh from 3.0 which should work with some tweaks:

 https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh

 You'll at least have to modify the jamm version to what's in yours. I
 think it's 2.5



 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 Thanks, Sebastian.

 Couple of questions (I'm really new to cassandra):
 1. How do I interpret the output of 'nodetool cfstats' to figure out the
 issues? Any documentation pointer on that would be helpful.

 2. I'm primarily a python/c developer - so, totally clueless about JVM
 environment. So, please bare with me as I would need a lot of hand-holding.
 Should I just copy+paste the settings you gave and try to restart the
 failing cassandra server?

 Thanks,
 Kunal

 On 10 July 2015 at 22:35, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 #1 You need more information.

 a) Take a look at your .hprof file (memory heap from the OOM) with an
 introspection tool like jhat or visualvm or java flight recorder and see
 what is using up your RAM.

 b) How big are your large rows (use nodetool cfstats on each node). If
 your data model is bad, you are going to have to re-design it no matter
 what.

 #2 As a possible workaround try using the G1GC allocator with the
 settings from c* 3.0 instead of CMS. I've seen lots of success with it
 lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely
 tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do *not*
 set the newgen size for G1 sets it dynamically:

 # min and max heap sizes should be set to the same value to avoid
 # stop-the-world GC pauses during resize, and so that we can lock the
 # heap in memory on startup to prevent any of it from being swapped
 # out.
 JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE}
 JVM_OPTS=$JVM_OPTS -Xmx${MAX_HEAP_SIZE}

 # Per-thread stack size.
 JVM_OPTS=$JVM_OPTS -Xss256k

 # Use the Hotspot garbage-first collector.
 JVM_OPTS=$JVM_OPTS -XX:+UseG1GC

 # Have the JVM do less remembered set work during STW, instead
 # preferring concurrent GC. Reduces p99.9 latency.
 JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5

 # The JVM maximum is 8 PGC threads and 1/4 

Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Kunal Gangakhedkar
And here is my cassandra-env.sh
https://gist.github.com/kunalg/2c092cb2450c62be9a20

Kunal

On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com
wrote:

 From jhat output, top 10 entries for Instance Count for All Classes
 (excluding platform) shows:

 2088223 instances of class org.apache.cassandra.db.BufferCell
 1983245 instances of class
 org.apache.cassandra.db.composites.CompoundSparseCellName
 1885974 instances of class
 org.apache.cassandra.db.composites.CompoundDenseCellName
 63 instances of class
 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
 503687 instances of class org.apache.cassandra.db.BufferDeletedCell
 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
 101800 instances of class org.apache.cassandra.utils.concurrent.Ref
 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State
 90704 instances of class
 org.apache.cassandra.utils.concurrent.Ref$GlobalState
 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey

 At the bottom of the page, it shows:
 Total of 8739510 instances occupying 193607512 bytes.
 JFYI.

 Kunal

 On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Thanks for quick reply.

 1. I don't know what are the thresholds that I should look for. So, to
 save this back-and-forth, I'm attaching the cfstats output for the keyspace.

 There is one table - daily_challenges - which shows compacted partition
 max bytes as ~460M and another one - daily_guest_logins - which shows
 compacted partition max bytes as ~36M.

 Can that be a problem?
 Here is the CQL schema for the daily_challenges column family:

 CREATE TABLE app_10001.daily_challenges (
 segment_type text,
 date timestamp,
 user_id int,
 sess_id text,
 data text,
 deleted boolean,
 PRIMARY KEY (segment_type, date, user_id, sess_id)
 ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class':
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
 'max_threshold': '32'}
 AND compression = {'sstable_compression':
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';

 CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);


 2. I don't know - how do I check? As I mentioned, I just installed the
 dsc21 update from datastax's debian repo (ver 2.1.7).

 Really appreciate your help.

 Thanks,
 Kunal

 On 10 July 2015 at 23:33, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 1. You want to look at # of sstables in cfhistograms or in cfstats look
 at:
 Compacted partition maximum bytes
 Maximum live cells per slice

 2) No, here's the env.sh from 3.0 which should work with some tweaks:

 https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh

 You'll at least have to modify the jamm version to what's in yours. I
 think it's 2.5



 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 Thanks, Sebastian.

 Couple of questions (I'm really new to cassandra):
 1. How do I interpret the output of 'nodetool cfstats' to figure out
 the issues? Any documentation pointer on that would be helpful.

 2. I'm primarily a python/c developer - so, totally clueless about JVM
 environment. So, please bare with me as I would need a lot of hand-holding.
 Should I just copy+paste the settings you gave and try to restart the
 failing cassandra server?

 Thanks,
 Kunal

 On 10 July 2015 at 22:35, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 #1 You need more information.

 a) Take a look at your .hprof file (memory heap from the OOM) 

Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Kunal Gangakhedkar
Thanks, Sebastian.

Couple of questions (I'm really new to cassandra):
1. How do I interpret the output of 'nodetool cfstats' to figure out the
issues? Any documentation pointer on that would be helpful.

2. I'm primarily a python/c developer - so, totally clueless about JVM
environment. So, please bare with me as I would need a lot of hand-holding.
Should I just copy+paste the settings you gave and try to restart the
failing cassandra server?

Thanks,
Kunal

On 10 July 2015 at 22:35, Sebastian Estevez sebastian.este...@datastax.com
wrote:

 #1 You need more information.

 a) Take a look at your .hprof file (memory heap from the OOM) with an
 introspection tool like jhat or visualvm or java flight recorder and see
 what is using up your RAM.

 b) How big are your large rows (use nodetool cfstats on each node). If
 your data model is bad, you are going to have to re-design it no matter
 what.

 #2 As a possible workaround try using the G1GC allocator with the settings
 from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr
 G1GC is much simpler than CMS and almost as good as a finely tuned CMS).
 *Note:* Use it with the latest Java 8 from Oracle. Do *not* set the
 newgen size for G1 sets it dynamically:

 # min and max heap sizes should be set to the same value to avoid
 # stop-the-world GC pauses during resize, and so that we can lock the
 # heap in memory on startup to prevent any of it from being swapped
 # out.
 JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE}
 JVM_OPTS=$JVM_OPTS -Xmx${MAX_HEAP_SIZE}

 # Per-thread stack size.
 JVM_OPTS=$JVM_OPTS -Xss256k

 # Use the Hotspot garbage-first collector.
 JVM_OPTS=$JVM_OPTS -XX:+UseG1GC

 # Have the JVM do less remembered set work during STW, instead
 # preferring concurrent GC. Reduces p99.9 latency.
 JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5

 # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
 # Machines with  10 cores may need additional threads.
 # Increase to = full cores (do not count HT cores).
 #JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=16
 #JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=16

 # Main G1GC tunable: lowering the pause target will lower throughput and
 vise versa.
 # 200ms is the JVM default and lowest viable setting
 # 1000ms increases throughput. Keep it smaller than the timeouts in
 cassandra.yaml.
 JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=500
 # Do reference processing in parallel GC.
 JVM_OPTS=$JVM_OPTS -XX:+ParallelRefProcEnabled

 # This may help eliminate STW.
 # The default in Hotspot 8u40 is 40%.
 #JVM_OPTS=$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25

 # For workloads that do large allocations, increasing the region
 # size may make things more efficient. Otherwise, let the JVM
 # set this automatically.
 #JVM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32m

 # Make sure all memory is faulted and zeroed on startup.
 # This helps prevent soft faults in containers and makes
 # transparent hugepage allocation more effective.
 JVM_OPTS=$JVM_OPTS -XX:+AlwaysPreTouch

 # Biased locking does not benefit Cassandra.
 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking

 # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
 JVM_OPTS=$JVM_OPTS -XX:StringTableSize=103

 # Enable thread-local allocation blocks and allow the JVM to automatically
 # resize them at runtime.
 JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB

 # http://www.evanjones.ca/jvm-mmap-pause.html
 JVM_OPTS=$JVM_OPTS -XX:+PerfDisableSharedMem


 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 I upgraded my instance from 8GB to a 14GB one.
 Allocated 8GB to jvm heap in cassandra-env.sh.

 And now, it crashes even faster with an OOM..

 Earlier, with 4GB heap, I could go upto ~90% replication completion (as
 reported by nodetool netstats); now, with 8GB heap, I cannot even get
 there. I've already restarted cassandra service 4 times with 8GB heap.

 No clue what's going on.. :(

 Kunal

 On 10 July 2015 at 17:45, Jack Krupansky jack.krupan...@gmail.com
 wrote:

 You, and only you, are responsible for knowing your 

Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Sebastian Estevez
1. You want to look at # of sstables in cfhistograms or in cfstats look at:
Compacted partition maximum bytes
Maximum live cells per slice

2) No, here's the env.sh from 3.0 which should work with some tweaks:
https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh

You'll at least have to modify the jamm version to what's in yours. I think
it's 2.5



All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Thanks, Sebastian.

 Couple of questions (I'm really new to cassandra):
 1. How do I interpret the output of 'nodetool cfstats' to figure out the
 issues? Any documentation pointer on that would be helpful.

 2. I'm primarily a python/c developer - so, totally clueless about JVM
 environment. So, please bare with me as I would need a lot of hand-holding.
 Should I just copy+paste the settings you gave and try to restart the
 failing cassandra server?

 Thanks,
 Kunal

 On 10 July 2015 at 22:35, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 #1 You need more information.

 a) Take a look at your .hprof file (memory heap from the OOM) with an
 introspection tool like jhat or visualvm or java flight recorder and see
 what is using up your RAM.

 b) How big are your large rows (use nodetool cfstats on each node). If
 your data model is bad, you are going to have to re-design it no matter
 what.

 #2 As a possible workaround try using the G1GC allocator with the
 settings from c* 3.0 instead of CMS. I've seen lots of success with it
 lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely
 tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do *not*
 set the newgen size for G1 sets it dynamically:

 # min and max heap sizes should be set to the same value to avoid
 # stop-the-world GC pauses during resize, and so that we can lock the
 # heap in memory on startup to prevent any of it from being swapped
 # out.
 JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE}
 JVM_OPTS=$JVM_OPTS -Xmx${MAX_HEAP_SIZE}

 # Per-thread stack size.
 JVM_OPTS=$JVM_OPTS -Xss256k

 # Use the Hotspot garbage-first collector.
 JVM_OPTS=$JVM_OPTS -XX:+UseG1GC

 # Have the JVM do less remembered set work during STW, instead
 # preferring concurrent GC. Reduces p99.9 latency.
 JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5

 # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
 # Machines with  10 cores may need additional threads.
 # Increase to = full cores (do not count HT cores).
 #JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=16
 #JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=16

 # Main G1GC tunable: lowering the pause target will lower throughput and
 vise versa.
 # 200ms is the JVM default and lowest viable setting
 # 1000ms increases throughput. Keep it smaller than the timeouts in
 cassandra.yaml.
 JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=500
 # Do reference processing in parallel GC.
 JVM_OPTS=$JVM_OPTS -XX:+ParallelRefProcEnabled

 # This may help eliminate STW.
 # The default in Hotspot 8u40 is 40%.
 #JVM_OPTS=$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25

 # For workloads that do large allocations, increasing the region
 # size may make things more efficient. Otherwise, let the JVM
 # set this automatically.
 #JVM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32m

 # Make sure all memory is faulted and zeroed on startup.
 # This helps prevent soft faults in containers and makes
 # transparent hugepage allocation more effective.
 JVM_OPTS=$JVM_OPTS -XX:+AlwaysPreTouch

 # Biased locking does not benefit Cassandra.
 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking

 # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
 JVM_OPTS=$JVM_OPTS -XX:StringTableSize=103

 # Enable thread-local allocation blocks and allow the JVM to
 automatically
 # resize them at runtime.
 JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB

 # http://www.evanjones.ca/jvm-mmap-pause.html
 JVM_OPTS=$JVM_OPTS -XX:+PerfDisableSharedMem


 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | 

Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Kunal Gangakhedkar
From jhat output, top 10 entries for Instance Count for All Classes
(excluding platform) shows:

2088223 instances of class org.apache.cassandra.db.BufferCell
1983245 instances of class
org.apache.cassandra.db.composites.CompoundSparseCellName
1885974 instances of class
org.apache.cassandra.db.composites.CompoundDenseCellName
63 instances of class
org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
503687 instances of class org.apache.cassandra.db.BufferDeletedCell
378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
101800 instances of class org.apache.cassandra.utils.concurrent.Ref
101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State
90704 instances of class
org.apache.cassandra.utils.concurrent.Ref$GlobalState
71123 instances of class org.apache.cassandra.db.BufferDecoratedKey

At the bottom of the page, it shows:
Total of 8739510 instances occupying 193607512 bytes.
JFYI.

Kunal

On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com
wrote:

 Thanks for quick reply.

 1. I don't know what are the thresholds that I should look for. So, to
 save this back-and-forth, I'm attaching the cfstats output for the keyspace.

 There is one table - daily_challenges - which shows compacted partition
 max bytes as ~460M and another one - daily_guest_logins - which shows
 compacted partition max bytes as ~36M.

 Can that be a problem?
 Here is the CQL schema for the daily_challenges column family:

 CREATE TABLE app_10001.daily_challenges (
 segment_type text,
 date timestamp,
 user_id int,
 sess_id text,
 data text,
 deleted boolean,
 PRIMARY KEY (segment_type, date, user_id, sess_id)
 ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class':
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
 'max_threshold': '32'}
 AND compression = {'sstable_compression':
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';

 CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);


 2. I don't know - how do I check? As I mentioned, I just installed the
 dsc21 update from datastax's debian repo (ver 2.1.7).

 Really appreciate your help.

 Thanks,
 Kunal

 On 10 July 2015 at 23:33, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 1. You want to look at # of sstables in cfhistograms or in cfstats look
 at:
 Compacted partition maximum bytes
 Maximum live cells per slice

 2) No, here's the env.sh from 3.0 which should work with some tweaks:

 https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh

 You'll at least have to modify the jamm version to what's in yours. I
 think it's 2.5



 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 Thanks, Sebastian.

 Couple of questions (I'm really new to cassandra):
 1. How do I interpret the output of 'nodetool cfstats' to figure out the
 issues? Any documentation pointer on that would be helpful.

 2. I'm primarily a python/c developer - so, totally clueless about JVM
 environment. So, please bare with me as I would need a lot of hand-holding.
 Should I just copy+paste the settings you gave and try to restart the
 failing cassandra server?

 Thanks,
 Kunal

 On 10 July 2015 at 22:35, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 #1 You need more information.

 a) Take a look at your .hprof file (memory heap from the OOM) with an
 introspection tool like jhat or visualvm or java flight recorder and see
 what is using up your RAM.

 b) How big are your large rows (use nodetool cfstats on each node). If
 your data 

Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Kunal Gangakhedkar
I upgraded my instance from 8GB to a 14GB one.
Allocated 8GB to jvm heap in cassandra-env.sh.

And now, it crashes even faster with an OOM..

Earlier, with 4GB heap, I could go upto ~90% replication completion (as
reported by nodetool netstats); now, with 8GB heap, I cannot even get
there. I've already restarted cassandra service 4 times with 8GB heap.

No clue what's going on.. :(

Kunal

On 10 July 2015 at 17:45, Jack Krupansky jack.krupan...@gmail.com wrote:

 You, and only you, are responsible for knowing your data and data model.

 If columns per row or rows per partition can be large, then an 8GB system
 is probably too small. But the real issue is that you need to keep your
 partition size from getting too large.

 Generally, an 8GB system is okay, but only for reasonably-sized
 partitions, like under 10MB.


 -- Jack Krupansky

 On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 I'm new to cassandra
 How do I find those out? - mainly, the partition params that you asked
 for. Others, I think I can figure out.

 We don't have any large objects/blobs in the column values - it's all
 textual, date-time, numeric and uuid data.

 We use cassandra to primarily store segmentation data - with segment type
 as partition key. That is again divided into two separate column families;
 but they have similar structure.

 Columns per row can be fairly large - each segment type as the row key
 and associated user ids and timestamp as column value.

 Thanks,
 Kunal

 On 10 July 2015 at 16:36, Jack Krupansky jack.krupan...@gmail.com
 wrote:

 What does your data and data model look like - partition size, rows per
 partition, number of columns per row, any large values/blobs in column
 values?

 You could run fine on an 8GB system, but only if your rows and
 partitions are reasonably small. Any large partitions could blow you away.

 -- Jack Krupansky

 On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 Attaching the stack dump captured from the last OOM.

 Kunal

 On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Forgot to mention: the data size is not that big - it's barely 10GB in
 all.

 Kunal

 On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Hi,

 I have a 2 node setup on Azure (east us region) running Ubuntu server
 14.04LTS.
 Both nodes have 8GB RAM.

 One of the nodes (seed node) died with OOM - so, I am trying to add a
 replacement node with same configuration.

 The problem is this new node also keeps dying with OOM - I've
 restarted the cassandra service like 8-10 times hoping that it would 
 finish
 the replication. But it didn't help.

 The one node that is still up is happily chugging along.
 All nodes have similar configuration - with libjna installed.

 Cassandra is installed from datastax's debian repo - pkg: dsc21
 version 2.1.7.
 I started off with the default configuration - i.e. the default
 cassandra-env.sh - which calculates the heap size automatically (1/4 * 
 RAM
 = 2GB)

 But, that didn't help. So, I then tried to increase the heap to 4GB
 manually and restarted. It still keeps crashing.

 Any clue as to why it's happening?

 Thanks,
 Kunal









Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Sebastian Estevez
#1 You need more information.

a) Take a look at your .hprof file (memory heap from the OOM) with an
introspection tool like jhat or visualvm or java flight recorder and see
what is using up your RAM.

b) How big are your large rows (use nodetool cfstats on each node). If your
data model is bad, you are going to have to re-design it no matter what.

#2 As a possible workaround try using the G1GC allocator with the settings
from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr
G1GC is much simpler than CMS and almost as good as a finely tuned CMS).
*Note:* Use it with the latest Java 8 from Oracle. Do *not* set the newgen
size for G1 sets it dynamically:

# min and max heap sizes should be set to the same value to avoid
 # stop-the-world GC pauses during resize, and so that we can lock the
 # heap in memory on startup to prevent any of it from being swapped
 # out.
 JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE}
 JVM_OPTS=$JVM_OPTS -Xmx${MAX_HEAP_SIZE}

 # Per-thread stack size.
 JVM_OPTS=$JVM_OPTS -Xss256k

 # Use the Hotspot garbage-first collector.
 JVM_OPTS=$JVM_OPTS -XX:+UseG1GC

 # Have the JVM do less remembered set work during STW, instead
 # preferring concurrent GC. Reduces p99.9 latency.
 JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5

 # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
 # Machines with  10 cores may need additional threads.
 # Increase to = full cores (do not count HT cores).
 #JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=16
 #JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=16

 # Main G1GC tunable: lowering the pause target will lower throughput and
 vise versa.
 # 200ms is the JVM default and lowest viable setting
 # 1000ms increases throughput. Keep it smaller than the timeouts in
 cassandra.yaml.
 JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=500
 # Do reference processing in parallel GC.
 JVM_OPTS=$JVM_OPTS -XX:+ParallelRefProcEnabled

 # This may help eliminate STW.
 # The default in Hotspot 8u40 is 40%.
 #JVM_OPTS=$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25

 # For workloads that do large allocations, increasing the region
 # size may make things more efficient. Otherwise, let the JVM
 # set this automatically.
 #JVM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32m

 # Make sure all memory is faulted and zeroed on startup.
 # This helps prevent soft faults in containers and makes
 # transparent hugepage allocation more effective.
 JVM_OPTS=$JVM_OPTS -XX:+AlwaysPreTouch

 # Biased locking does not benefit Cassandra.
 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking

 # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
 JVM_OPTS=$JVM_OPTS -XX:StringTableSize=103

 # Enable thread-local allocation blocks and allow the JVM to automatically
 # resize them at runtime.
 JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB

 # http://www.evanjones.ca/jvm-mmap-pause.html
 JVM_OPTS=$JVM_OPTS -XX:+PerfDisableSharedMem


All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar 
kgangakhed...@gmail.com wrote:

 I upgraded my instance from 8GB to a 14GB one.
 Allocated 8GB to jvm heap in cassandra-env.sh.

 And now, it crashes even faster with an OOM..

 Earlier, with 4GB heap, I could go upto ~90% replication completion (as
 reported by nodetool netstats); now, with 8GB heap, I cannot even get
 there. I've already restarted cassandra service 4 times with 8GB heap.

 No clue what's going on.. :(

 Kunal

 On 10 July 2015 at 17:45, Jack Krupansky jack.krupan...@gmail.com wrote:

 You, and only you, are responsible for knowing your data and data model.

 If columns per row or rows per partition can be large, then an 8GB system
 is probably too small. But the real issue is that you need to keep your
 partition size from getting too large.

 Generally, an 8GB system is okay, but only for reasonably-sized
 partitions, like under 10MB.


 -- Jack Krupansky

 On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar 
 kgangakhed...@gmail.com wrote:

 I'm new to cassandra
 How do I find those out? - mainly, the partition params that you asked
 for. Others, I think I can figure out.

 We don't have any large 

Re: Cassandra OOM on joining existing ring

2015-07-10 Thread Sebastian Estevez
#1

 There is one table - daily_challenges - which shows compacted partition
 max bytes as ~460M and another one - daily_guest_logins - which shows
 compacted partition max bytes as ~36M.


460 is high, I like to keep my partitions under 100mb when possible. I've
seen worse though. The fix is to add something else (maybe month or week or
something) into your partition key:

 PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id)

#2 looks like your jam version is 3 per your env.sh so you're probably okay
to copy the env.sh over from the C* 3.0 link I shared once you uncomment
and tweak the MAX_HEAP. If there's something wrong your node won't come up.
tail your logs.



All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 And here is my cassandra-env.sh
 https://gist.github.com/kunalg/2c092cb2450c62be9a20

 Kunal

 On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 From jhat output, top 10 entries for Instance Count for All Classes
 (excluding platform) shows:

 2088223 instances of class org.apache.cassandra.db.BufferCell
 1983245 instances of class
 org.apache.cassandra.db.composites.CompoundSparseCellName
 1885974 instances of class
 org.apache.cassandra.db.composites.CompoundDenseCellName
 63 instances of class
 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
 503687 instances of class org.apache.cassandra.db.BufferDeletedCell
 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
 101800 instances of class org.apache.cassandra.utils.concurrent.Ref
 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State

 90704 instances of class
 org.apache.cassandra.utils.concurrent.Ref$GlobalState
 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey

 At the bottom of the page, it shows:
 Total of 8739510 instances occupying 193607512 bytes.
 JFYI.

 Kunal

 On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com
 wrote:

 Thanks for quick reply.

 1. I don't know what are the thresholds that I should look for. So, to
 save this back-and-forth, I'm attaching the cfstats output for the keyspace.

 There is one table - daily_challenges - which shows compacted partition
 max bytes as ~460M and another one - daily_guest_logins - which shows
 compacted partition max bytes as ~36M.

 Can that be a problem?
 Here is the CQL schema for the daily_challenges column family:

 CREATE TABLE app_10001.daily_challenges (
 segment_type text,
 date timestamp,
 user_id int,
 sess_id text,
 data text,
 deleted boolean,
 PRIMARY KEY (segment_type, date, user_id, sess_id)
 ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class':
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
 'max_threshold': '32'}
 AND compression = {'sstable_compression':
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';

 CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);


 2. I don't know - how do I check? As I mentioned, I just installed the
 dsc21 update from datastax's debian repo (ver 2.1.7).

 Really appreciate your help.

 Thanks,
 Kunal

 On 10 July 2015 at 23:33, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 1. You want to look at # of sstables in cfhistograms or in cfstats look
 at:
 Compacted partition maximum bytes
 Maximum live cells per slice

 2) No, here's the env.sh from 3.0 which should work with some tweaks:

 https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh

 You'll at least have to modify the jamm version to what's in 

Re: Cassandra OOM, many deletedColumn

2013-03-13 Thread aaron morton
 For JVM Heap it is 2G
Try 4G

   and gc_grace = 1800
Realised that I did not provide a warning about the implication this has for 
node tool repair. If you are doing deleted on the CF you need to run nodetool 
repair every gc_grace seconds. 

In this case I think you main problem was not enough JVM heap. Try setting it 
to 4G and see how that goes. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/03/2013, at 8:17 PM, 金剑 jinjia...@gmail.com wrote:

 Thanks for you reply. we will try both of your recommentation. The OS memory 
 is 8G, For JVM Heap it is 2G, DeletedColumn used 1.4G which are rooted from 
 readStage thread. Do you think we need increase the size of JVM Heap?  
 
  Configuration for the index columnFamily is 
 
 create column family purge
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 1.0
   and gc_grace = 1800
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy = 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';
 
 
 Best Regards!
 
 Jian Jin
 
 
 2013/3/9 aaron morton aa...@thelastpickle.com
 You need to provide some details of the machine and the JVM configuration. 
 But lets say you need to have 4Gb to 8GB for the JVM heap. 
 
 If you have many deleted columns I would say you have a *lot* of garbage in 
 each row. Consider reducing the gc_grace seconds so the columns are purged 
 more frequently, not however that columns are only purged when all fragments 
 of the row are part of the minor compaction. 
 
 If you have a mixed write / delete work load consider using the Levelled 
 compaction strategy 
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 6/03/2013, at 10:37 PM, Jason Wee peich...@gmail.com wrote:
 
 hmm.. did you managed to take a look using nodetool tpstats? That may give 
 you indication further..
 
 Jason
 
 
 On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote:
 Hi,
 
 My version is  1.1.7
 
 Our use case is : we have a index columnfamily to record how many resource 
 is stored for a user. The number might vary from tens to millions.
 
 We provide a feature to let user to delete resource according prefix.
 
 
  we found some cassandra will OOM after some period. The cluster is a kind 
 of cross-datacenter ring.
 
 1. Exception in cassandra log:
 
 ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java 
 (line 135) Exception in thread Thread[Thread-5810,5,main] 
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down 
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) 
 at 
 java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) 
 at 
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) 
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
  
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
  
 ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java 
 (line 135) Exception in thread Thread[Thread-5819,5,main] 
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down 
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) 
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) 
 at 
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) 
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
  
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
  
 ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java (line 
 135) Exception in thread Thread[Thread-36,5,main] 
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down 
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) 
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) 
 at 
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) 
 at 

Re: Cassandra OOM, many deletedColumn

2013-03-12 Thread 金剑
Thanks for you reply. we will try both of your recommentation. The OS
memory is 8G, For JVM Heap it is 2G, DeletedColumn used 1.4G which are
rooted from readStage thread. Do you think we need increase the size of JVM
Heap?

 Configuration for the index columnFamily is

create column family purge
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 1.0
  and gc_grace = 1800
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';


Best Regards!

Jian Jin


2013/3/9 aaron morton aa...@thelastpickle.com

 You need to provide some details of the machine and the JVM configuration.
 But lets say you need to have 4Gb to 8GB for the JVM heap.

 If you have many deleted columns I would say you have a *lot* of garbage
 in each row. Consider reducing the gc_grace seconds so the columns are
 purged more frequently, not however that columns are only purged when all
 fragments of the row are part of the minor compaction.

 If you have a mixed write / delete work load consider using the Levelled
 compaction strategy
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 6/03/2013, at 10:37 PM, Jason Wee peich...@gmail.com wrote:

 hmm.. did you managed to take a look using nodetool tpstats? That may give
 you indication further..

 Jason


 On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote:

 Hi,

 My version is  1.1.7

 Our use case is : we have a index columnfamily to record how many
 resource is stored for a user. The number might vary from tens to millions.

 We provide a feature to let user to delete resource according prefix.


  we found some cassandra will OOM after some period. The cluster is a
 kind of cross-datacenter ring.

 1. Exception in cassandra log:

 ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-5810,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758)

 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-5819,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-36,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-3990,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 

Re: Cassandra OOM, many deletedColumn

2013-03-08 Thread aaron morton
You need to provide some details of the machine and the JVM configuration. But 
lets say you need to have 4Gb to 8GB for the JVM heap. 

If you have many deleted columns I would say you have a *lot* of garbage in 
each row. Consider reducing the gc_grace seconds so the columns are purged more 
frequently, not however that columns are only purged when all fragments of the 
row are part of the minor compaction. 

If you have a mixed write / delete work load consider using the Levelled 
compaction strategy 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 10:37 PM, Jason Wee peich...@gmail.com wrote:

 hmm.. did you managed to take a look using nodetool tpstats? That may give 
 you indication further..
 
 Jason
 
 
 On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote:
 Hi,
 
 My version is  1.1.7
 
 Our use case is : we have a index columnfamily to record how many resource is 
 stored for a user. The number might vary from tens to millions.
 
 We provide a feature to let user to delete resource according prefix.
 
 
  we found some cassandra will OOM after some period. The cluster is a kind of 
 cross-datacenter ring.
 
 1. Exception in cassandra log:
 
 ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java 
 (line 135) Exception in thread Thread[Thread-5810,5,main] 
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down 
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) 
 at 
 java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) 
 at 
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) 
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
  
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
  
 ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java 
 (line 135) Exception in thread Thread[Thread-5819,5,main] 
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down 
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) 
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) 
 at 
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) 
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
  
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
  
 ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java (line 
 135) Exception in thread Thread[Thread-36,5,main] 
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down 
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) 
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) 
 at 
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) 
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
  
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
  
 ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java 
 (line 135) Exception in thread Thread[Thread-3990,5,main] 
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down 
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
  
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) 
 at 
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) 
 at 
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) 
 at 
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
  
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)
  
 ERROR [ACCEPT-/10.139.50.62] AbstractCassandraDaemon.java (line 135) 
 Exception in thread Thread[ACCEPT-/10.139.50.62,5,main] 
 java.lang.RuntimeException: java.nio.channels.ClosedChannelException 
 at 
 org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:710)
  
 Caused by: java.nio.channels.ClosedChannelException 
 at 
 

Re: Cassandra OOM, many deletedColumn

2013-03-06 Thread Jason Wee
hmm.. did you managed to take a look using nodetool tpstats? That may give
you indication further..

Jason


On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote:

 Hi,

 My version is  1.1.7

 Our use case is : we have a index columnfamily to record how many resource
 is stored for a user. The number might vary from tens to millions.

 We provide a feature to let user to delete resource according prefix.


  we found some cassandra will OOM after some period. The cluster is a kind
 of cross-datacenter ring.

 1. Exception in cassandra log:

 ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-5810,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758)

 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-5819,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-36,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java
 (line 135) Exception in thread Thread[Thread-3990,5,main]
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)

 at
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)

 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)

 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

 ERROR [ACCEPT-/10.139.50.62] AbstractCassandraDaemon.java (line 135)
 Exception in thread Thread[ACCEPT-/10.139.50.62,5,main]
 java.lang.RuntimeException: java.nio.channels.ClosedChannelException
 at
 org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:710)

 Caused by: java.nio.channels.ClosedChannelException
 at
 sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:137)
 at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
 at
 org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:699)

  INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
 (line 374) Timed out replaying hints to /23.20.84.240; aborting further
 deliveries
  INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
 (line 392) Finished hinted handoff of 0 rows to endpoint
  INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java
 (line 296) Started hinted handoff for token: 3

 2. From heap dump, there are many deletedColumn found, rooted from thread
 readStage.


 Pls help: where might be the problem?

 Best Regards!

 Jian Jin



Re: Cassandra OOM crash while mapping commitlog

2012-08-15 Thread Robin Verlangen
Everything still runs smooth. It's really plausible that the 1.1.3 version
resolved this bug.

2012/8/13 Robin Verlangen ro...@us2.nl

 3 hours ago I finished the upgraded of our cluster. Currently it runs
 quite smooth. I'll give an update within a week if this really solved our
 issues.

 Cheers!


 2012/8/13 Robin Verlangen ro...@us2.nl

 @Tyler: We were already running most of our machines in 64bit JVM (Sun,
 not the OpenJDK). Those also crashed.

 @Holger: Good to hear that. I'll schedule an update for our Cassandra
 cluster.

 Thank you both for your time.


 2012/8/13 Holger Hoffstaette holger.hoffstae...@googlemail.com

 On Sun, 12 Aug 2012 13:36:42 +0200, Robin Verlangen wrote:

  Hmm, is issue caused by some 1.x version? Before it never occurred to
 us.

 This bug was introduced in 1.1.0 and has been fixed in 1.1.3, where the
 closed/recycled segments are now closed  unmapped properly. The default
 sizes are also smaller.
 Of course the question remains why an append-only commitlog needs to be
 mmap'ed in the first place, especially for writing..

 -h





 --
 With kind regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.




 --
 With kind regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Cassandra OOM crash while mapping commitlog

2012-08-13 Thread Robin Verlangen
@Tyler: We were already running most of our machines in 64bit JVM (Sun, not
the OpenJDK). Those also crashed.

@Holger: Good to hear that. I'll schedule an update for our Cassandra
cluster.

Thank you both for your time.

2012/8/13 Holger Hoffstaette holger.hoffstae...@googlemail.com

 On Sun, 12 Aug 2012 13:36:42 +0200, Robin Verlangen wrote:

  Hmm, is issue caused by some 1.x version? Before it never occurred to us.

 This bug was introduced in 1.1.0 and has been fixed in 1.1.3, where the
 closed/recycled segments are now closed  unmapped properly. The default
 sizes are also smaller.
 Of course the question remains why an append-only commitlog needs to be
 mmap'ed in the first place, especially for writing..

 -h





-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Cassandra OOM crash while mapping commitlog

2012-08-13 Thread Robin Verlangen
3 hours ago I finished the upgraded of our cluster. Currently it runs quite
smooth. I'll give an update within a week if this really solved our issues.

Cheers!

2012/8/13 Robin Verlangen ro...@us2.nl

 @Tyler: We were already running most of our machines in 64bit JVM (Sun,
 not the OpenJDK). Those also crashed.

 @Holger: Good to hear that. I'll schedule an update for our Cassandra
 cluster.

 Thank you both for your time.


 2012/8/13 Holger Hoffstaette holger.hoffstae...@googlemail.com

 On Sun, 12 Aug 2012 13:36:42 +0200, Robin Verlangen wrote:

  Hmm, is issue caused by some 1.x version? Before it never occurred to
 us.

 This bug was introduced in 1.1.0 and has been fixed in 1.1.3, where the
 closed/recycled segments are now closed  unmapped properly. The default
 sizes are also smaller.
 Of course the question remains why an append-only commitlog needs to be
 mmap'ed in the first place, especially for writing..

 -h





 --
 With kind regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Cassandra OOM crash while mapping commitlog

2012-08-12 Thread Robin Verlangen
Hmm, is issue caused by some 1.x version? Before it never occurred to us.
Op 11 aug. 2012 22:36 schreef Tyler Hobbs ty...@datastax.com het
volgende:

 We've seen something similar when running on a 32bit JVM, so make sure
 you're using the latest 64bit Java 6 JVM.

 On Sat, Aug 11, 2012 at 11:59 AM, Robin Verlangen ro...@us2.nl wrote:

 Hi there,

 I currently see Cassandra crash every couple of days. I run a 3 node
 cluster on version 1.1.2. Does anyone have a clue why it crashes? I
 couldn't find it as fix in a newer release. Is this an actual bug or did I
 do something wrong?

 Thank you in advance for your time.

 Last 100 log lines before crash:

 * INFO [FlushWriter:39] 2012-08-11 12:51:00,933 Memtable.java (line 307)
 Completed flushing
 /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-hd-7-Data.db
 (10778171 bytes) for commitlog position
 ReplayPosition(segmentId=2831860362157183, position=89962041)*
 * INFO [OptionalTasks:1] 2012-08-11 13:12:30,940 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
 ColumnFamily='wordevents') (estimated 74393593 bytes)*
 * INFO [OptionalTasks:1] 2012-08-11 13:12:30,941 ColumnFamilyStore.java
 (line 643) Enqueuing flush of Memtable-wordevents@32552383(22883734/74393593
 serialized/live bytes, 227279 ops)*
 * INFO [FlushWriter:40] 2012-08-11 13:12:30,941 Memtable.java (line 266)
 Writing Memtable-wordevents@32552383(22883734/74393593 serialized/live
 bytes, 227279 ops)*
 * INFO [FlushWriter:40] 2012-08-11 13:12:31,703 Memtable.java (line 307)
 Completed flushing
 /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-158-Data.db
 (11800327 bytes) for commitlog position
 ReplayPosition(segmentId=2831860362157183, position=116934579)*
 * INFO [MemoryMeter:1] 2012-08-11 14:01:36,942 Memtable.java (line 213)
 CFS(Keyspace='OpsCenter', ColumnFamily='rollups7200') liveRatio is
 6.158919689235077 (just-counted was 4.408341190092955).  calculation took
 100ms for 16409 columns*
 * INFO [CompactionExecutor:88] 2012-08-11 14:08:27,875
 AutoSavingCache.java (line 262) Saved KeyCache (38164 items) in 70 ms*
 * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
 ColumnFamily='wordevents') (estimated 74346493 bytes)*
 * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 ColumnFamilyStore.java
 (line 643) Enqueuing flush of Memtable-wordevents@10789879(22869246/74346493
 serialized/live bytes, 226341 ops)*
 * INFO [FlushWriter:41] 2012-08-11 14:18:37,520 Memtable.java (line 266)
 Writing Memtable-wordevents@10789879(22869246/74346493 serialized/live
 bytes, 226341 ops)*
 * INFO [FlushWriter:41] 2012-08-11 14:18:38,288 Memtable.java (line 307)
 Completed flushing
 /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-159-Data.db
 (11796722 bytes) for commitlog position
 ReplayPosition(segmentId=2838466681767183, position=67094743)*
 * WARN [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 197)
 setting live ratio to minimum of 1.0 instead of 0.45760196307363504*
 * INFO [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 213)
 CFS(Keyspace='Wupa', ColumnFamily='PageViewsHost') liveRatio is
 1.0421914932457101 (just-counted was 1.0).  calculation took 2ms for 175
 columns*
 * INFO [MemoryMeter:1] 2012-08-11 14:33:20,916 Memtable.java (line 213)
 CFS(Keyspace='OpsCenter', ColumnFamily='rollups60') liveRatio is
 4.067582667928898 (just-counted was 4.031462910772899).  calculation took
 711ms for 169224 columns*
 * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='OpsCenter',
 ColumnFamily='pdps') (estimated 74395427 bytes)*
 * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 ColumnFamilyStore.java
 (line 643) Enqueuing flush of Memtable-pdps@30500189(9222554/74395427
 serialized/live bytes, 214478 ops)*
 * INFO [FlushWriter:42] 2012-08-11 14:59:20,910 Memtable.java (line 266)
 Writing Memtable-pdps@30500189(9222554/74395427 serialized/live bytes,
 214478 ops)*
 * INFO [FlushWriter:42] 2012-08-11 14:59:21,420 Memtable.java (line 307)
 Completed flushing
 /var/lib/cassandra/data/OpsCenter/pdps/OpsCenter-pdps-hd-11351-Data.db
 (6928124 bytes) for commitlog position
 ReplayPosition(segmentId=2838466681767183, position=117115966)*
 * INFO [MemoryMeter:1] 2012-08-11 14:59:31,138 Memtable.java (line 213)
 CFS(Keyspace='OpsCenter', ColumnFamily='pdps') liveRatio is
 14.460953759840738 (just-counted was 14.460953759840738).  calculation took
 28ms for 878 columns*
 * INFO [OptionalTasks:1] 2012-08-11 15:25:41,366 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
 ColumnFamily='wordevents') (estimated 74974061 bytes)*
 * INFO [OptionalTasks:1] 2012-08-11 15:25:41,367 ColumnFamilyStore.java
 (line 643) Enqueuing flush of Memtable-wordevents@24703812(23062288/74974061
 

Re: Cassandra OOM crash while mapping commitlog

2012-08-12 Thread Holger Hoffstaette
On Sun, 12 Aug 2012 13:36:42 +0200, Robin Verlangen wrote:

 Hmm, is issue caused by some 1.x version? Before it never occurred to us.

This bug was introduced in 1.1.0 and has been fixed in 1.1.3, where the
closed/recycled segments are now closed  unmapped properly. The default
sizes are also smaller.
Of course the question remains why an append-only commitlog needs to be
mmap'ed in the first place, especially for writing..

-h




Re: Cassandra OOM crash while mapping commitlog

2012-08-11 Thread Tyler Hobbs
We've seen something similar when running on a 32bit JVM, so make sure
you're using the latest 64bit Java 6 JVM.

On Sat, Aug 11, 2012 at 11:59 AM, Robin Verlangen ro...@us2.nl wrote:

 Hi there,

 I currently see Cassandra crash every couple of days. I run a 3 node
 cluster on version 1.1.2. Does anyone have a clue why it crashes? I
 couldn't find it as fix in a newer release. Is this an actual bug or did I
 do something wrong?

 Thank you in advance for your time.

 Last 100 log lines before crash:

 * INFO [FlushWriter:39] 2012-08-11 12:51:00,933 Memtable.java (line 307)
 Completed flushing
 /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-hd-7-Data.db
 (10778171 bytes) for commitlog position
 ReplayPosition(segmentId=2831860362157183, position=89962041)*
 * INFO [OptionalTasks:1] 2012-08-11 13:12:30,940 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
 ColumnFamily='wordevents') (estimated 74393593 bytes)*
 * INFO [OptionalTasks:1] 2012-08-11 13:12:30,941 ColumnFamilyStore.java
 (line 643) Enqueuing flush of Memtable-wordevents@32552383(22883734/74393593
 serialized/live bytes, 227279 ops)*
 * INFO [FlushWriter:40] 2012-08-11 13:12:30,941 Memtable.java (line 266)
 Writing Memtable-wordevents@32552383(22883734/74393593 serialized/live
 bytes, 227279 ops)*
 * INFO [FlushWriter:40] 2012-08-11 13:12:31,703 Memtable.java (line 307)
 Completed flushing
 /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-158-Data.db
 (11800327 bytes) for commitlog position
 ReplayPosition(segmentId=2831860362157183, position=116934579)*
 * INFO [MemoryMeter:1] 2012-08-11 14:01:36,942 Memtable.java (line 213)
 CFS(Keyspace='OpsCenter', ColumnFamily='rollups7200') liveRatio is
 6.158919689235077 (just-counted was 4.408341190092955).  calculation took
 100ms for 16409 columns*
 * INFO [CompactionExecutor:88] 2012-08-11 14:08:27,875
 AutoSavingCache.java (line 262) Saved KeyCache (38164 items) in 70 ms*
 * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
 ColumnFamily='wordevents') (estimated 74346493 bytes)*
 * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 ColumnFamilyStore.java
 (line 643) Enqueuing flush of Memtable-wordevents@10789879(22869246/74346493
 serialized/live bytes, 226341 ops)*
 * INFO [FlushWriter:41] 2012-08-11 14:18:37,520 Memtable.java (line 266)
 Writing Memtable-wordevents@10789879(22869246/74346493 serialized/live
 bytes, 226341 ops)*
 * INFO [FlushWriter:41] 2012-08-11 14:18:38,288 Memtable.java (line 307)
 Completed flushing
 /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-159-Data.db
 (11796722 bytes) for commitlog position
 ReplayPosition(segmentId=2838466681767183, position=67094743)*
 * WARN [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 197)
 setting live ratio to minimum of 1.0 instead of 0.45760196307363504*
 * INFO [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 213)
 CFS(Keyspace='Wupa', ColumnFamily='PageViewsHost') liveRatio is
 1.0421914932457101 (just-counted was 1.0).  calculation took 2ms for 175
 columns*
 * INFO [MemoryMeter:1] 2012-08-11 14:33:20,916 Memtable.java (line 213)
 CFS(Keyspace='OpsCenter', ColumnFamily='rollups60') liveRatio is
 4.067582667928898 (just-counted was 4.031462910772899).  calculation took
 711ms for 169224 columns*
 * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='OpsCenter',
 ColumnFamily='pdps') (estimated 74395427 bytes)*
 * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 ColumnFamilyStore.java
 (line 643) Enqueuing flush of Memtable-pdps@30500189(9222554/74395427
 serialized/live bytes, 214478 ops)*
 * INFO [FlushWriter:42] 2012-08-11 14:59:20,910 Memtable.java (line 266)
 Writing Memtable-pdps@30500189(9222554/74395427 serialized/live bytes,
 214478 ops)*
 * INFO [FlushWriter:42] 2012-08-11 14:59:21,420 Memtable.java (line 307)
 Completed flushing
 /var/lib/cassandra/data/OpsCenter/pdps/OpsCenter-pdps-hd-11351-Data.db
 (6928124 bytes) for commitlog position
 ReplayPosition(segmentId=2838466681767183, position=117115966)*
 * INFO [MemoryMeter:1] 2012-08-11 14:59:31,138 Memtable.java (line 213)
 CFS(Keyspace='OpsCenter', ColumnFamily='pdps') liveRatio is
 14.460953759840738 (just-counted was 14.460953759840738).  calculation took
 28ms for 878 columns*
 * INFO [OptionalTasks:1] 2012-08-11 15:25:41,366 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican',
 ColumnFamily='wordevents') (estimated 74974061 bytes)*
 * INFO [OptionalTasks:1] 2012-08-11 15:25:41,367 ColumnFamilyStore.java
 (line 643) Enqueuing flush of Memtable-wordevents@24703812(23062288/74974061
 serialized/live bytes, 228878 ops)*
 * INFO [FlushWriter:43] 2012-08-11 15:25:41,367 Memtable.java (line 266)
 Writing 

Re: Cassandra OOM - 1.0.2

2012-02-07 Thread aaron morton
Just to ask the stupid question, have you tried setting it really high ? Like 
50 ? 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/02/2012, at 10:27 AM, Ajeet Grewal wrote:

 Here are the last few lines of strace (of one of the threads). There
 are a bunch of mmap system calls. Notice the last mmap call a couple
 of lines before the trace ends. Could the last mmap call fail?
 
 == BEGIN STRACE ==
 mmap(NULL, 2147487599, PROT_READ, MAP_SHARED, 37, 0xbb000) = 0x7709b54000
 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0
 mmap(NULL, 214743, PROT_READ, MAP_SHARED, 37, 0xc7fffb000) = 0x7789b55000
 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0
 mmap(NULL, 2147483522, PROT_READ, MAP_SHARED, 37, 0xc4000) = 0x7809b4f000
 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0
 mmap(NULL, 1586100174, PROT_READ, MAP_SHARED, 37, 0xd7fff3000) = 0x7889b4f000
 dup2(40, 37)= 37
 close(37)   = 0
 open(/home/y/var/fresh_cassandra/data/fresh/counter_object-h-4240-Filter.db,
 O_RDONLY) = 37
 .
 .
 .
 .
 close(37)   = 0
 futex(0x2ab5a39754, FUTEX_WAKE, 1)  = 1
 futex(0x2ab5a39750, FUTEX_WAKE, 1)  = 1
 futex(0x40116940, FUTEX_WAKE, 1)= 1
 mmap(0x41a17000, 12288, PROT_NONE,
 MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x41a17000
 rt_sigprocmask(SIG_SETMASK, [QUIT], NULL, 8) = 0
 _exit(0)= ?
 == END STRACE ==
 
 -- 
 Regards,
 Ajeet



Re: Cassandra OOM - 1.0.2

2012-02-07 Thread Ajeet Grewal
On Tue, Feb 7, 2012 at 10:45 AM, aaron morton aa...@thelastpickle.com wrote:
 Just to ask the stupid question, have you tried setting it really high ?
 Like 50 ?


No I have not. I moved to mmap_index_only as a stopgap solution.

Is it possible for there to be that many mmaps for about 300 db files?

-- 
Regards,
Ajeet


Re: Cassandra OOM - 1.0.2

2012-02-06 Thread Ajeet Grewal
On Sat, Feb 4, 2012 at 7:03 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Sounds like you need to increase sysctl vm.max_map_count

This did not work. I increased vm.max_map_count from 65536 to 131072.
I am still getting the same error.

ERROR [SSTableBatchOpen:4] 2012-02-06 11:43:50,463
AbstractCassandraDaemon.java (line 133) Fatal exception in thread
Thread[SSTableBatchOpen:4,5,main]
java.io.IOError: java.io.IOException: Map failed
at 
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:225)
at 
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:202)
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:380)
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:159)
at 
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:197)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Map failed

--
Regards,
Ajeet


Re: Cassandra OOM - 1.0.2

2012-02-06 Thread Ajeet Grewal
On Mon, Feb 6, 2012 at 11:50 AM, Ajeet Grewal asgre...@gmail.com wrote:
 On Sat, Feb 4, 2012 at 7:03 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Sounds like you need to increase sysctl vm.max_map_count

 This did not work. I increased vm.max_map_count from 65536 to 131072.
 I am still getting the same error.

The number of files in the data directory is small (~300), so I dont
see why mmap should fail because of this.

-- 
Regards,
Ajeet


Re: Cassandra OOM - 1.0.2

2012-02-06 Thread Ajeet Grewal
Here are the last few lines of strace (of one of the threads). There
are a bunch of mmap system calls. Notice the last mmap call a couple
of lines before the trace ends. Could the last mmap call fail?

== BEGIN STRACE ==
mmap(NULL, 2147487599, PROT_READ, MAP_SHARED, 37, 0xbb000) = 0x7709b54000
fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0
mmap(NULL, 214743, PROT_READ, MAP_SHARED, 37, 0xc7fffb000) = 0x7789b55000
fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0
mmap(NULL, 2147483522, PROT_READ, MAP_SHARED, 37, 0xc4000) = 0x7809b4f000
fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0
mmap(NULL, 1586100174, PROT_READ, MAP_SHARED, 37, 0xd7fff3000) = 0x7889b4f000
dup2(40, 37)= 37
close(37)   = 0
open(/home/y/var/fresh_cassandra/data/fresh/counter_object-h-4240-Filter.db,
O_RDONLY) = 37
.
.
.
.
close(37)   = 0
futex(0x2ab5a39754, FUTEX_WAKE, 1)  = 1
futex(0x2ab5a39750, FUTEX_WAKE, 1)  = 1
futex(0x40116940, FUTEX_WAKE, 1)= 1
mmap(0x41a17000, 12288, PROT_NONE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x41a17000
rt_sigprocmask(SIG_SETMASK, [QUIT], NULL, 8) = 0
_exit(0)= ?
== END STRACE ==

-- 
Regards,
Ajeet


Re: Cassandra OOM - 1.0.2

2012-02-04 Thread Jonathan Ellis
Sounds like you need to increase sysctl vm.max_map_count

On Fri, Feb 3, 2012 at 7:27 PM, Ajeet Grewal asgre...@gmail.com wrote:
 Hey guys,

 I am getting an out of memory (mmap failed) error with Cassandra
 1.0.2. The relevant log lines are pasted at
 http://pastebin.com/UM28ZC1g.

 Cassandra works fine until it reaches about 300-400GB of load (on one
 instance, I have 12 nodes RF=2). Then nodes start failing with such
 errors. The nodes are pretty beefy, 32GB of ram, 8 cores. Increasing
 the JVM heap size does not help.

 I am running on a 64bit jvm. I am using jna. I have memlock unlimited
 for the user. (I confirmed this by looking at /proc/pid/limits).

 I also tried restarting the process as root, but it crashes with the same 
 error.

 Also the number of files that I have in the data directory is about
 ~300, so it should not be exceeding the open files limit.

 I don't know if this is relevant. I just have two column families,
 counter_object and counter_time. I am using very wide columns, so row
 sizes can be huge. You can see from the log link, that the *.db files
 are sometimes pretty big.

 Please help! Thank you!

 --
 Regards,
 Ajeet



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cassandra OOM

2012-01-13 Thread Віталій Тимчишин
2012/1/4 Vitalii Tymchyshyn tiv...@gmail.com

 04.01.12 14:25, Radim Kolar написав(ла):

   So, what are cassandra memory requirement? Is it 1% or 2% of disk data?
 It depends on number of rows you have. if you have lot of rows then
 primary memory eaters are index sampling data and bloom filters. I use
 index sampling 512 and bloom filters set to 4% to cut down memory needed.

 I've raised index sampling and bloom filter setting seems not to be on
 trunk yet. For me memtables is what's eating heap :(


Hello, all.

I've found out and fixed the problem today (after one my node OOMed
constantly replaying heap on start-up). full-key deletes are not accounted
and so column families with delete-only operations are not flushed. Here is
Jira: https://issues.apache.org/jira/browse/CASSANDRA-3741 and my pull
request to fix it: https://github.com/apache/cassandra/pull/5

Best regards, Vitalii Tymchyshyn


Re: Cassandra OOM

2012-01-04 Thread Vitalii Tymchyshyn

Hello.

BTW: It would be great for cassandra to shutdown on Errors like OOM 
because now I am not sure if the problem described in previous email is 
the root cause or some of OOM error found in log made some writer stop.


I am now looking at different OOMs in my cluster. Currently each node 
has up to 300G of data in ~10 column families. Previous Heap Size of 3G 
seems to be not enough, I am raising to to 5G. Looking at heap dumps, a 
lot of memory is taken by memtables, much more than 1/3 of heap. At the 
same time, logs say that it has nothing to flush since there are not 
dirty memtables. So, what are cassandra memory requirement? Is it 1% or 
2% of disk data? Or may be I am doing something wrong?


Best regards, Vitalii Tymchyshyn

03.01.12 20:58, aaron morton написав(ла):
The DynamicSnitch can result in less read operations been sent to a 
node, but as long as a node is marked as UP mutations are sent to all 
replicas. Nodes will shed load when they pull messages off the queue 
that have expired past rpc_timeout, but they will not feed back flow 
control to the other nodes. Other than going down or performing slow 
enough for the dynamic snitch to route reads around them.


There are also safety valves in there to reduce the size of the 
memtables and caches in response to low memory. Perhaps that process 
could also shed messages from thread pools with a high number of 
pending messages.


**But** going OOM with 2M+ mutations in the thread pool sounds like 
the server was going down anyway. Did you look into why all the 
messages were there ?


Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/01/2012, at 11:18 PM, Віталій Тимчишин wrote:


Hello.

We are using cassandra for some time in our project. Currently we are 
on 1.1 trunk (it was accidental migration, but since it's hard to 
migrate back and it's performing nice enough we are currently on 1.1).
During New Year holidays one of the servers've produces a number of 
OOM messages in the log.
According to heap dump taken, most of the memory is taken by 
MutationStage queue (over 2millions of items).
So, I am curious now if cassandra have any flow control for messages? 
We are using Quorum for writes and it seems to me that one slow 
server may start getting more messages than it can consume. The 
writes will still succeed performed by other servers in the 
replication set.
If there is no flow control, it should eventually get OOM. Is it the 
case? Are there any plans to handle this?
BTW: A lot of memory (~half) is taken by Inet4Address objects, so 
making a cache of such objects would make this problem less possible.


--
Best regards,
 Vitalii Tymchyshyn






Re: Cassandra OOM

2012-01-04 Thread Vitalii Tymchyshyn

04.01.12 14:25, Radim Kolar написав(ла):

 So, what are cassandra memory requirement? Is it 1% or 2% of disk data?
It depends on number of rows you have. if you have lot of rows then 
primary memory eaters are index sampling data and bloom filters. I use 
index sampling 512 and bloom filters set to 4% to cut down memory needed.
I've raised index sampling and bloom filter setting seems not to be on 
trunk yet. For me memtables is what's eating heap :(


Best regards, Vitalii Tymchyshyn.


Re: Cassandra OOM

2012-01-03 Thread aaron morton
The DynamicSnitch can result in less read operations been sent to a node, but 
as long as a node is marked as UP mutations are sent to all replicas. Nodes 
will shed load when they pull messages off the queue that have expired past 
rpc_timeout, but they will not feed back flow control to the other nodes. Other 
than going down or performing slow enough for the dynamic snitch to route reads 
around them.

There are also safety valves in there to reduce the size of the memtables and 
caches in response to low memory. Perhaps that process could also shed messages 
from thread pools with a high number of pending messages. 

**But** going OOM with 2M+ mutations in the thread pool sounds like the server 
was going down anyway. Did you look into why all the messages were there ? 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/01/2012, at 11:18 PM, Віталій Тимчишин wrote:

 Hello.
 
 We are using cassandra for some time in our project. Currently we are on 1.1 
 trunk (it was accidental migration, but since it's hard to migrate back and 
 it's performing nice enough we are currently on 1.1).
 During New Year holidays one of the servers've produces a number of OOM 
 messages in the log.
 According to heap dump taken, most of the memory is taken by MutationStage 
 queue (over 2millions of items).
 So, I am curious now if cassandra have any flow control for messages? We are 
 using Quorum for writes and it seems to me that one slow server may start 
 getting more messages than it can consume. The writes will still succeed 
 performed by other servers in the replication set.
 If there is no flow control, it should eventually get OOM. Is it the case? 
 Are there any plans to handle this?
 BTW: A lot of memory (~half) is taken by Inet4Address objects, so making a 
 cache of such objects would make this problem less possible. 
 
 -- 
 Best regards,
  Vitalii Tymchyshyn



Re: Cassandra OOM on repair.

2011-07-17 Thread Andrey Stepachev
Looks like problem in code:

public IndexSummary(long expectedKeys)
{
long expectedEntries = expectedKeys /
DatabaseDescriptor.getIndexInterval();
if (expectedEntries  Integer.MAX_VALUE)
// TODO: that's a _lot_ of keys, or a very low interval
throw new RuntimeException(Cannot use index_interval of  +
DatabaseDescriptor.getIndexInterval() +  with  + expectedKeys + 
(expected) keys.);
indexPositions = new ArrayListKeyPosition((int)expectedEntries);
}

I have too many keys, and too small index interval.

To fix this, I can:
1) reduce number of keys - rewrite app and sacrifice balance
2) increase index_interval - hurt another column families

A question:
Are there any drawbacks for using different indexInterval for column
families
in keyspace? (suppose I'll write a patch)

2011/7/15 Andrey Stepachev oct...@gmail.com

 Looks like key indexes eat all memory:

 http://paste.kde.org/97213/


 2011/7/15 Andrey Stepachev oct...@gmail.com

 UPDATE:

 I found, that
 a) with min10G cassandra survive.
 b) I have ~1000 sstables
 c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow

 So, I have a question:
 a) if row is bigger then 64mb before compaction, why it compacted in
 memory
 b) if it smaller, what eats so much memory?

 2011/7/15 Andrey Stepachev oct...@gmail.com

 Hi all.

 Cassandra constantly OOM on repair or compaction. Increasing memory
 doesn't help (6G)
 I can give more, but I think that this is not a regular situation.
 Cluster has 4 nodes. RF=3.
 Cassandra version 0.8.1

 Ring looks like this:
  Address DC  RackStatus State   Load
  OwnsToken

  127605887595351923798765477786913079296
 xxx.xxx.xxx.66  datacenter1 rack1   Up Normal  176.96 GB
 25.00%  0
 xxx.xxx.xxx.69  datacenter1 rack1   Up Normal  178.19 GB
 25.00%  42535295865117307932921825928971026432
 xxx.xxx.xxx.67  datacenter1 rack1   Up Normal  178.26 GB
 25.00%  85070591730234615865843651857942052864
 xxx.xxx.xxx.68  datacenter1 rack1   Up Normal  175.2 GB
  25.00%  127605887595351923798765477786913079296

 About schema:
 I have big rows (100k, up to several millions). But as I know, it is
 normal for cassandra.
 All things work relatively good, until I start long running
 pre-production tests. I load
 data and after a while (~4hours) cluster begin timeout and them some
 nodes die with OOM.
 My app retries to send, so after short period all nodes becomes down.
 Very nasty.

 But now, I can OOM nodes by simple call nodetool repair.
 In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to
 upper limit.
 cfstats shows: http://paste.kde.org/96817/
 config is: http://paste.kde.org/96823/
 A question is: does anybody knows, what this means. Why cassandra tries
 to load
 something big into memory at once?

 A.






Re: Cassandra OOM on repair.

2011-07-17 Thread Jonathan Ellis
Can't think of any.

On Sun, Jul 17, 2011 at 1:27 PM, Andrey Stepachev oct...@gmail.com wrote:
 Looks like problem in code:
     public IndexSummary(long expectedKeys)
     {
         long expectedEntries = expectedKeys /
 DatabaseDescriptor.getIndexInterval();
         if (expectedEntries  Integer.MAX_VALUE)
             // TODO: that's a _lot_ of keys, or a very low interval
             throw new RuntimeException(Cannot use index_interval of  +
 DatabaseDescriptor.getIndexInterval() +  with  + expectedKeys + 
 (expected) keys.);
         indexPositions = new ArrayListKeyPosition((int)expectedEntries);
     }
 I have too many keys, and too small index interval.
 To fix this, I can:
 1) reduce number of keys - rewrite app and sacrifice balance
 2) increase index_interval - hurt another column families
 A question:
 Are there any drawbacks for using different indexInterval for column
 families
 in keyspace? (suppose I'll write a patch)
 2011/7/15 Andrey Stepachev oct...@gmail.com

 Looks like key indexes eat all memory:
 http://paste.kde.org/97213/

 2011/7/15 Andrey Stepachev oct...@gmail.com

 UPDATE:
 I found, that
 a) with min10G cassandra survive.
 b) I have ~1000 sstables
 c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow
 So, I have a question:
 a) if row is bigger then 64mb before compaction, why it compacted in
 memory
 b) if it smaller, what eats so much memory?
 2011/7/15 Andrey Stepachev oct...@gmail.com

 Hi all.
 Cassandra constantly OOM on repair or compaction. Increasing memory
 doesn't help (6G)
 I can give more, but I think that this is not a regular situation.
 Cluster has 4 nodes. RF=3.
 Cassandra version 0.8.1
 Ring looks like this:
  Address         DC          Rack        Status State   Load
  Owns    Token

        127605887595351923798765477786913079296
 xxx.xxx.xxx.66  datacenter1 rack1       Up     Normal  176.96 GB
 25.00%  0
 xxx.xxx.xxx.69  datacenter1 rack1       Up     Normal  178.19 GB
 25.00%  42535295865117307932921825928971026432
 xxx.xxx.xxx.67  datacenter1 rack1       Up     Normal  178.26 GB
 25.00%  85070591730234615865843651857942052864
 xxx.xxx.xxx.68  datacenter1 rack1       Up     Normal  175.2 GB
  25.00%  127605887595351923798765477786913079296
 About schema:
 I have big rows (100k, up to several millions). But as I know, it is
 normal for cassandra.
 All things work relatively good, until I start long running
 pre-production tests. I load
 data and after a while (~4hours) cluster begin timeout and them some
 nodes die with OOM.
 My app retries to send, so after short period all nodes becomes down.
 Very nasty.
 But now, I can OOM nodes by simple call nodetool repair.
 In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to
 upper limit.
 cfstats shows: http://paste.kde.org/96817/
 config is: http://paste.kde.org/96823/
 A question is: does anybody knows, what this means. Why cassandra tries
 to load
 something big into memory at once?
 A.






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cassandra OOM on repair.

2011-07-15 Thread Andrey Stepachev
Looks like key indexes eat all memory:

http://paste.kde.org/97213/


2011/7/15 Andrey Stepachev oct...@gmail.com

 UPDATE:

 I found, that
 a) with min10G cassandra survive.
 b) I have ~1000 sstables
 c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow

 So, I have a question:
 a) if row is bigger then 64mb before compaction, why it compacted in memory
 b) if it smaller, what eats so much memory?

 2011/7/15 Andrey Stepachev oct...@gmail.com

 Hi all.

 Cassandra constantly OOM on repair or compaction. Increasing memory
 doesn't help (6G)
 I can give more, but I think that this is not a regular situation. Cluster
 has 4 nodes. RF=3.
 Cassandra version 0.8.1

 Ring looks like this:
  Address DC  RackStatus State   Load
  OwnsToken

  127605887595351923798765477786913079296
 xxx.xxx.xxx.66  datacenter1 rack1   Up Normal  176.96 GB
 25.00%  0
 xxx.xxx.xxx.69  datacenter1 rack1   Up Normal  178.19 GB
 25.00%  42535295865117307932921825928971026432
 xxx.xxx.xxx.67  datacenter1 rack1   Up Normal  178.26 GB
 25.00%  85070591730234615865843651857942052864
 xxx.xxx.xxx.68  datacenter1 rack1   Up Normal  175.2 GB
  25.00%  127605887595351923798765477786913079296

 About schema:
 I have big rows (100k, up to several millions). But as I know, it is
 normal for cassandra.
 All things work relatively good, until I start long running pre-production
 tests. I load
 data and after a while (~4hours) cluster begin timeout and them some nodes
 die with OOM.
 My app retries to send, so after short period all nodes becomes down. Very
 nasty.

 But now, I can OOM nodes by simple call nodetool repair.
 In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to
 upper limit.
 cfstats shows: http://paste.kde.org/96817/
 config is: http://paste.kde.org/96823/
 A question is: does anybody knows, what this means. Why cassandra tries to
 load
 something big into memory at once?

 A.