Re: cassandra OOM
Hello @Durity Would you mind to share information about your cluster? Actually I am interested to know which version of cassandra you use. And how much time do the gc pauses spend. Thank you very much Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Tue, Apr 25, 2017 at 7:47 PM, Durity, Sean R <sean_r_dur...@homedepot.com > wrote: > We have seen much better stability (and MUCH less GC pauses) from G1 with > a variety of heap sizes. I don’t even consider CMS any more. > > > > > > Sean Durity > > > > *From:* Gopal, Dhruva [mailto:dhruva.go...@aspect.com] > *Sent:* Tuesday, April 04, 2017 5:34 PM > *To:* user@cassandra.apache.org > *Subject:* Re: cassandra OOM > > > > Thanks, that’s interesting – so CMS is a better option for > stability/performance? We’ll try this out in our cluster. > > > > *From: *Alexander Dejanovski <a...@thelastpickle.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Monday, April 3, 2017 at 10:31 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Re: cassandra OOM > > > > Hi, > > > > we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB > heap when the workload is intense, and given you're running on m4.2xl I > wouldn't go over 16GB for the heap. > > > > I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new > gen. You can use 5 as MaxTenuringThreshold as an initial value and activate > GC logging to fine tune the settings afterwards. > > > > FYI CMS tends to perform better than G1 even though it's a little bit > harder to tune. > > > > Cheers, > > > > On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com> > wrote: > > 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using > m2.2xlarge instances in AWS): > > > > > > # > > # HEAP SETTINGS # > > # > > > > # Heap size is automatically calculated by cassandra-env based on this > > # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) > > # That is: > > # - calculate 1/2 ram and cap to 1024MB > > # - calculate 1/4 ram and cap to 8192MB > > # - pick the max > > # > > # For production use you may wish to adjust this for your environment. > > # If that's the case, uncomment the -Xmx and Xms options below to override > the > > # automatic calculation of JVM heap memory. > > # > > # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to > > # the same value to avoid stop-the-world GC pauses during resize, and > > # so that we can lock the heap in memory on startup to prevent any > > # of it from being swapped out. > > -Xms16G > > -Xmx16G > > > > # Young generation size is automatically calculated by cassandra-env > > # based on this formula: min(100 * num_cores, 1/4 * heap size) > > # > > # The main trade-off for the young generation is that the larger it > > # is, the longer GC pause times will be. The shorter it is, the more > > # expensive GC will be (usually). > > # > > # It is not recommended to set the young generation size if using the > > # G1 GC, since that will override the target pause-time goal. > > # More info: http://www.oracle.com/technetwork/articles/java/ > g1gc-1984535.html > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork_articles_java_g1gc-2D1984535.html=DwMGaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk=xuMqARzoTSasEmAPkP7fVOcPZS050fy1N2_Ac5poOtA=> > > # > > # The example below assumes a modern 8-core+ machine for decent > > # times. If in doubt, and if you do not particularly want to tweak, go > > # 100 MB per physical CPU core. > > #-Xmn800M > > > > # > > # GC SETTINGS # > > # > > > > ### CMS Settings > > > > #-XX:+UseParNewGC > > #-XX:+UseConcMarkSweepGC > > #-XX:+CMSParallelRemarkEnabled > > #-XX:SurvivorRatio=8 > > #-XX:MaxTenuringThreshold=1 > > #-XX:CMSInitiatingOccupancyFraction=75 > > #-XX:+UseCMSInitiatingOccupancyOnly > > #-XX:CMSWaitDuration=1 > > #-XX:+CMSParallelInitialMarkEnabled > > #-XX:+CMSEdenChunksRecordAlways > > # some JVMs will fill up their heap when accessed via JMX, see > CASSANDRA-6541 > > #-XX:+CMSClassUnloadingEnabled > > > > ### G1 Settings (experimental, comment previous section and uncomment > section below to enable)
Re: cassandra OOM
To add some contribution to this thread, we have seen both cases. CMS easily outperforming G1 for the same Heapsize and the inverse too. On the same cluster different workloads (datacenter based) we have both collectors because of performance based on the workload. It would be good to colect this information out and do a talk/blog, but for a later time. Regards, Carlos Juzarte Rolo Cassandra Consultant / Datastax Certified Architect / Cassandra MVP Pythian - Love your data rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin: *linkedin.com/in/carlosjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>* Mobile: +351 918 918 100 www.pythian.com On Tue, Apr 25, 2017 at 6:47 PM, Durity, Sean R <sean_r_dur...@homedepot.com > wrote: > We have seen much better stability (and MUCH less GC pauses) from G1 with > a variety of heap sizes. I don’t even consider CMS any more. > > > > > > Sean Durity > > > > *From:* Gopal, Dhruva [mailto:dhruva.go...@aspect.com] > *Sent:* Tuesday, April 04, 2017 5:34 PM > *To:* user@cassandra.apache.org > *Subject:* Re: cassandra OOM > > > > Thanks, that’s interesting – so CMS is a better option for > stability/performance? We’ll try this out in our cluster. > > > > *From: *Alexander Dejanovski <a...@thelastpickle.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Monday, April 3, 2017 at 10:31 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Re: cassandra OOM > > > > Hi, > > > > we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB > heap when the workload is intense, and given you're running on m4.2xl I > wouldn't go over 16GB for the heap. > > > > I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new > gen. You can use 5 as MaxTenuringThreshold as an initial value and activate > GC logging to fine tune the settings afterwards. > > > > FYI CMS tends to perform better than G1 even though it's a little bit > harder to tune. > > > > Cheers, > > > > On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com> > wrote: > > 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using > m2.2xlarge instances in AWS): > > > > > > # > > # HEAP SETTINGS # > > # > > > > # Heap size is automatically calculated by cassandra-env based on this > > # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) > > # That is: > > # - calculate 1/2 ram and cap to 1024MB > > # - calculate 1/4 ram and cap to 8192MB > > # - pick the max > > # > > # For production use you may wish to adjust this for your environment. > > # If that's the case, uncomment the -Xmx and Xms options below to override > the > > # automatic calculation of JVM heap memory. > > # > > # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to > > # the same value to avoid stop-the-world GC pauses during resize, and > > # so that we can lock the heap in memory on startup to prevent any > > # of it from being swapped out. > > -Xms16G > > -Xmx16G > > > > # Young generation size is automatically calculated by cassandra-env > > # based on this formula: min(100 * num_cores, 1/4 * heap size) > > # > > # The main trade-off for the young generation is that the larger it > > # is, the longer GC pause times will be. The shorter it is, the more > > # expensive GC will be (usually). > > # > > # It is not recommended to set the young generation size if using the > > # G1 GC, since that will override the target pause-time goal. > > # More info: http://www.oracle.com/technetwork/articles/java/ > g1gc-1984535.html > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork_articles_java_g1gc-2D1984535.html=DwMGaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk=xuMqARzoTSasEmAPkP7fVOcPZS050fy1N2_Ac5poOtA=> > > # > > # The example below assumes a modern 8-core+ machine for decent > > # times. If in doubt, and if you do not particularly want to tweak, go > > # 100 MB per physical CPU core. > > #-Xmn800M > > > > # > > # GC SETTINGS # > > # > > > > ### CMS Settings > > > > #-XX:+UseParNewGC > > #-XX:+UseConcMarkSweepGC > > #-XX:+CMSParallelRemarkEnabled > > #-XX:SurvivorRatio=8 > > #-XX:MaxTenuringThreshold=1 > > #-XX:CMSInitiatingOccupancyFraction=75 > > #-XX:+UseCMSInitiatingOccupancyOnl
RE: cassandra OOM
We have seen much better stability (and MUCH less GC pauses) from G1 with a variety of heap sizes. I don’t even consider CMS any more. Sean Durity From: Gopal, Dhruva [mailto:dhruva.go...@aspect.com] Sent: Tuesday, April 04, 2017 5:34 PM To: user@cassandra.apache.org Subject: Re: cassandra OOM Thanks, that’s interesting – so CMS is a better option for stability/performance? We’ll try this out in our cluster. From: Alexander Dejanovski <a...@thelastpickle.com<mailto:a...@thelastpickle.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Monday, April 3, 2017 at 10:31 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: cassandra OOM Hi, we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB heap when the workload is intense, and given you're running on m4.2xl I wouldn't go over 16GB for the heap. I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new gen. You can use 5 as MaxTenuringThreshold as an initial value and activate GC logging to fine tune the settings afterwards. FYI CMS tends to perform better than G1 even though it's a little bit harder to tune. Cheers, On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote: 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using m2.2xlarge instances in AWS): # # HEAP SETTINGS # # # Heap size is automatically calculated by cassandra-env based on this # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) # That is: # - calculate 1/2 ram and cap to 1024MB # - calculate 1/4 ram and cap to 8192MB # - pick the max # # For production use you may wish to adjust this for your environment. # If that's the case, uncomment the -Xmx and Xms options below to override the # automatic calculation of JVM heap memory. # # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to # the same value to avoid stop-the-world GC pauses during resize, and # so that we can lock the heap in memory on startup to prevent any # of it from being swapped out. -Xms16G -Xmx16G # Young generation size is automatically calculated by cassandra-env # based on this formula: min(100 * num_cores, 1/4 * heap size) # # The main trade-off for the young generation is that the larger it # is, the longer GC pause times will be. The shorter it is, the more # expensive GC will be (usually). # # It is not recommended to set the young generation size if using the # G1 GC, since that will override the target pause-time goal. # More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork_articles_java_g1gc-2D1984535.html=DwMGaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk=xuMqARzoTSasEmAPkP7fVOcPZS050fy1N2_Ac5poOtA=> # # The example below assumes a modern 8-core+ machine for decent # times. If in doubt, and if you do not particularly want to tweak, go # 100 MB per physical CPU core. #-Xmn800M # # GC SETTINGS # # ### CMS Settings #-XX:+UseParNewGC #-XX:+UseConcMarkSweepGC #-XX:+CMSParallelRemarkEnabled #-XX:SurvivorRatio=8 #-XX:MaxTenuringThreshold=1 #-XX:CMSInitiatingOccupancyFraction=75 #-XX:+UseCMSInitiatingOccupancyOnly #-XX:CMSWaitDuration=1 #-XX:+CMSParallelInitialMarkEnabled #-XX:+CMSEdenChunksRecordAlways # some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541 #-XX:+CMSClassUnloadingEnabled ### G1 Settings (experimental, comment previous section and uncomment section below to enable) ## Use the Hotspot garbage-first collector. -XX:+UseG1GC # ## Have the JVM do less remembered set work during STW, instead ## preferring concurrent GC. Reduces p99.9 latency. -XX:G1RSetUpdatingPauseTimePercent=5 # ## Main G1GC tunable: lowering the pause target will lower throughput and vise versa. ## 200ms is the JVM default and lowest viable setting ## 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml. -XX:MaxGCPauseMillis=500 ## Optional G1 Settings # Save CPU time on large (>= 16GB) heaps by delaying region scanning # until the heap is 70% full. The default in Hotspot 8u40 is 40%. -XX:InitiatingHeapOccupancyPercent=70 # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number of logical cores. # Otherwise equal to the number of cores when 8 or less. # Machines with > 10 cores should try setting these to <= full cores. #-XX:ParallelGCThreads=16 # By default, ConcGCThreads is 1/4 of ParallelGCThreads. # Setting both to the same value can reduce STW durations. #-XX:ConcGCThreads=16 ### GC loggin
Re: cassandra OOM
Thanks, that’s interesting – so CMS is a better option for stability/performance? We’ll try this out in our cluster. From: Alexander Dejanovski <a...@thelastpickle.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Monday, April 3, 2017 at 10:31 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: cassandra OOM Hi, we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB heap when the workload is intense, and given you're running on m4.2xl I wouldn't go over 16GB for the heap. I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new gen. You can use 5 as MaxTenuringThreshold as an initial value and activate GC logging to fine tune the settings afterwards. FYI CMS tends to perform better than G1 even though it's a little bit harder to tune. Cheers, On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote: 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using m2.2xlarge instances in AWS): # # HEAP SETTINGS # # # Heap size is automatically calculated by cassandra-env based on this # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) # That is: # - calculate 1/2 ram and cap to 1024MB # - calculate 1/4 ram and cap to 8192MB # - pick the max # # For production use you may wish to adjust this for your environment. # If that's the case, uncomment the -Xmx and Xms options below to override the # automatic calculation of JVM heap memory. # # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to # the same value to avoid stop-the-world GC pauses during resize, and # so that we can lock the heap in memory on startup to prevent any # of it from being swapped out. -Xms16G -Xmx16G # Young generation size is automatically calculated by cassandra-env # based on this formula: min(100 * num_cores, 1/4 * heap size) # # The main trade-off for the young generation is that the larger it # is, the longer GC pause times will be. The shorter it is, the more # expensive GC will be (usually). # # It is not recommended to set the young generation size if using the # G1 GC, since that will override the target pause-time goal. # More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html # # The example below assumes a modern 8-core+ machine for decent # times. If in doubt, and if you do not particularly want to tweak, go # 100 MB per physical CPU core. #-Xmn800M # # GC SETTINGS # # ### CMS Settings #-XX:+UseParNewGC #-XX:+UseConcMarkSweepGC #-XX:+CMSParallelRemarkEnabled #-XX:SurvivorRatio=8 #-XX:MaxTenuringThreshold=1 #-XX:CMSInitiatingOccupancyFraction=75 #-XX:+UseCMSInitiatingOccupancyOnly #-XX:CMSWaitDuration=1 #-XX:+CMSParallelInitialMarkEnabled #-XX:+CMSEdenChunksRecordAlways # some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541 #-XX:+CMSClassUnloadingEnabled ### G1 Settings (experimental, comment previous section and uncomment section below to enable) ## Use the Hotspot garbage-first collector. -XX:+UseG1GC # ## Have the JVM do less remembered set work during STW, instead ## preferring concurrent GC. Reduces p99.9 latency. -XX:G1RSetUpdatingPauseTimePercent=5 # ## Main G1GC tunable: lowering the pause target will lower throughput and vise versa. ## 200ms is the JVM default and lowest viable setting ## 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml. -XX:MaxGCPauseMillis=500 ## Optional G1 Settings # Save CPU time on large (>= 16GB) heaps by delaying region scanning # until the heap is 70% full. The default in Hotspot 8u40 is 40%. -XX:InitiatingHeapOccupancyPercent=70 # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number of logical cores. # Otherwise equal to the number of cores when 8 or less. # Machines with > 10 cores should try setting these to <= full cores. #-XX:ParallelGCThreads=16 # By default, ConcGCThreads is 1/4 of ParallelGCThreads. # Setting both to the same value can reduce STW durations. #-XX:ConcGCThreads=16 ### GC logging options -- uncomment to enable #-XX:+PrintGCDetails #-XX:+PrintGCDateStamps #-XX:+PrintHeapAtGC #-XX:+PrintTenuringDistribution #-XX:+PrintGCApplicationStoppedTime #-XX:+PrintPromotionFailure #-XX:PrintFLSStatistics=1 #-Xloggc:/var/log/cassandra/gc.log #-XX:+UseGCLogFileRotation #-XX:NumberOfGCLogFiles=10 #-XX:GCLogFileSize=10M From: Alexander Dejanovski <a...@thelastpickle.com<mailto:a...@thelastpickle.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Monday, April 3, 2017 at 8:00 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org
Re: cassandra OOM
Hi, we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB heap when the workload is intense, and given you're running on m4.2xl I wouldn't go over 16GB for the heap. I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new gen. You can use 5 as MaxTenuringThreshold as an initial value and activate GC logging to fine tune the settings afterwards. FYI CMS tends to perform better than G1 even though it's a little bit harder to tune. Cheers, On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com> wrote: > 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using > m2.2xlarge instances in AWS): > > > > > > # > > # HEAP SETTINGS # > > # > > > > # Heap size is automatically calculated by cassandra-env based on this > > # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) > > # That is: > > # - calculate 1/2 ram and cap to 1024MB > > # - calculate 1/4 ram and cap to 8192MB > > # - pick the max > > # > > # For production use you may wish to adjust this for your environment. > > # If that's the case, uncomment the -Xmx and Xms options below to override > the > > # automatic calculation of JVM heap memory. > > # > > # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to > > # the same value to avoid stop-the-world GC pauses during resize, and > > # so that we can lock the heap in memory on startup to prevent any > > # of it from being swapped out. > > -Xms16G > > -Xmx16G > > > > # Young generation size is automatically calculated by cassandra-env > > # based on this formula: min(100 * num_cores, 1/4 * heap size) > > # > > # The main trade-off for the young generation is that the larger it > > # is, the longer GC pause times will be. The shorter it is, the more > > # expensive GC will be (usually). > > # > > # It is not recommended to set the young generation size if using the > > # G1 GC, since that will override the target pause-time goal. > > # More info: > http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html > > # > > # The example below assumes a modern 8-core+ machine for decent > > # times. If in doubt, and if you do not particularly want to tweak, go > > # 100 MB per physical CPU core. > > #-Xmn800M > > > > # > > # GC SETTINGS # > > # > > > > ### CMS Settings > > > > #-XX:+UseParNewGC > > #-XX:+UseConcMarkSweepGC > > #-XX:+CMSParallelRemarkEnabled > > #-XX:SurvivorRatio=8 > > #-XX:MaxTenuringThreshold=1 > > #-XX:CMSInitiatingOccupancyFraction=75 > > #-XX:+UseCMSInitiatingOccupancyOnly > > #-XX:CMSWaitDuration=1 > > #-XX:+CMSParallelInitialMarkEnabled > > #-XX:+CMSEdenChunksRecordAlways > > # some JVMs will fill up their heap when accessed via JMX, see > CASSANDRA-6541 > > #-XX:+CMSClassUnloadingEnabled > > > > ### G1 Settings (experimental, comment previous section and uncomment > section below to enable) > > > > ## Use the Hotspot garbage-first collector. > > -XX:+UseG1GC > > # > > ## Have the JVM do less remembered set work during STW, instead > > ## preferring concurrent GC. Reduces p99.9 latency. > > -XX:G1RSetUpdatingPauseTimePercent=5 > > # > > ## Main G1GC tunable: lowering the pause target will lower throughput and > vise versa. > > ## 200ms is the JVM default and lowest viable setting > > ## 1000ms increases throughput. Keep it smaller than the timeouts in > cassandra.yaml. > > -XX:MaxGCPauseMillis=500 > > > > ## Optional G1 Settings > > > > # Save CPU time on large (>= 16GB) heaps by delaying region scanning > > # until the heap is 70% full. The default in Hotspot 8u40 is 40%. > > -XX:InitiatingHeapOccupancyPercent=70 > > > > # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the > number of logical cores. > > # Otherwise equal to the number of cores when 8 or less. > > # Machines with > 10 cores should try setting these to <= full cores. > > #-XX:ParallelGCThreads=16 > > # By default, ConcGCThreads is 1/4 of ParallelGCThreads. > > # Setting both to the same value can reduce STW durations. > > #-XX:ConcGCThreads=16 > > > > ### GC logging options -- uncomment to enable > > > > #-XX:+PrintGCDetails > > #-XX:+PrintGCDateStamps > > #-XX:+PrintHeapAtGC > > #-XX:+PrintTenuringDistribution > > #-XX:+PrintGCApplicationStoppedTime > > #-XX:+PrintPromotionFailure > > #-XX:PrintFLSStatistics=1 > > #-Xloggc
Re: cassandra OOM
16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using m2.2xlarge instances in AWS): # # HEAP SETTINGS # # # Heap size is automatically calculated by cassandra-env based on this # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) # That is: # - calculate 1/2 ram and cap to 1024MB # - calculate 1/4 ram and cap to 8192MB # - pick the max # # For production use you may wish to adjust this for your environment. # If that's the case, uncomment the -Xmx and Xms options below to override the # automatic calculation of JVM heap memory. # # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to # the same value to avoid stop-the-world GC pauses during resize, and # so that we can lock the heap in memory on startup to prevent any # of it from being swapped out. -Xms16G -Xmx16G # Young generation size is automatically calculated by cassandra-env # based on this formula: min(100 * num_cores, 1/4 * heap size) # # The main trade-off for the young generation is that the larger it # is, the longer GC pause times will be. The shorter it is, the more # expensive GC will be (usually). # # It is not recommended to set the young generation size if using the # G1 GC, since that will override the target pause-time goal. # More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html # # The example below assumes a modern 8-core+ machine for decent # times. If in doubt, and if you do not particularly want to tweak, go # 100 MB per physical CPU core. #-Xmn800M # # GC SETTINGS # # ### CMS Settings #-XX:+UseParNewGC #-XX:+UseConcMarkSweepGC #-XX:+CMSParallelRemarkEnabled #-XX:SurvivorRatio=8 #-XX:MaxTenuringThreshold=1 #-XX:CMSInitiatingOccupancyFraction=75 #-XX:+UseCMSInitiatingOccupancyOnly #-XX:CMSWaitDuration=1 #-XX:+CMSParallelInitialMarkEnabled #-XX:+CMSEdenChunksRecordAlways # some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541 #-XX:+CMSClassUnloadingEnabled ### G1 Settings (experimental, comment previous section and uncomment section below to enable) ## Use the Hotspot garbage-first collector. -XX:+UseG1GC # ## Have the JVM do less remembered set work during STW, instead ## preferring concurrent GC. Reduces p99.9 latency. -XX:G1RSetUpdatingPauseTimePercent=5 # ## Main G1GC tunable: lowering the pause target will lower throughput and vise versa. ## 200ms is the JVM default and lowest viable setting ## 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml. -XX:MaxGCPauseMillis=500 ## Optional G1 Settings # Save CPU time on large (>= 16GB) heaps by delaying region scanning # until the heap is 70% full. The default in Hotspot 8u40 is 40%. -XX:InitiatingHeapOccupancyPercent=70 # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number of logical cores. # Otherwise equal to the number of cores when 8 or less. # Machines with > 10 cores should try setting these to <= full cores. #-XX:ParallelGCThreads=16 # By default, ConcGCThreads is 1/4 of ParallelGCThreads. # Setting both to the same value can reduce STW durations. #-XX:ConcGCThreads=16 ### GC logging options -- uncomment to enable #-XX:+PrintGCDetails #-XX:+PrintGCDateStamps #-XX:+PrintHeapAtGC #-XX:+PrintTenuringDistribution #-XX:+PrintGCApplicationStoppedTime #-XX:+PrintPromotionFailure #-XX:PrintFLSStatistics=1 #-Xloggc:/var/log/cassandra/gc.log #-XX:+UseGCLogFileRotation #-XX:NumberOfGCLogFiles=10 #-XX:GCLogFileSize=10M From: Alexander Dejanovski <a...@thelastpickle.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Monday, April 3, 2017 at 8:00 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: cassandra OOM Hi, could you share your GC settings ? G1 or CMS ? Heap size, etc... Thanks, On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruva <dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote: Hi – We’ve had what looks like an OOM situation with Cassandra (we have a dump file that got generated) in our staging (performance/load testing environment) and I wanted to reach out to this user group to see if you had any recommendations on how we should approach our investigation as to the cause of this issue. The logs don’t seem to point to any obvious issues, and we’re no experts in analyzing this by any means, so was looking for guidance on how to proceed. Should we enter a Jira as well? We’re on Cassandra 3.9, and are running a six node cluster. This happened in a controlled load testing environment. Feedback will be much appreciated! Regards, Dhruva This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and d
Re: cassandra OOM
Hi, could you share your GC settings ? G1 or CMS ? Heap size, etc... Thanks, On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruvawrote: > Hi – > > We’ve had what looks like an OOM situation with Cassandra (we have a > dump file that got generated) in our staging (performance/load testing > environment) and I wanted to reach out to this user group to see if you had > any recommendations on how we should approach our investigation as to the > cause of this issue. The logs don’t seem to point to any obvious issues, > and we’re no experts in analyzing this by any means, so was looking for > guidance on how to proceed. Should we enter a Jira as well? We’re on > Cassandra 3.9, and are running a six node cluster. This happened in a > controlled load testing environment. Feedback will be much appreciated! > > > > > > Regards, > > Dhruva > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. > -- - Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Cassandra OOM on joining existing ring
Are you on the azure premium storage? http://www.datastax.com/2015/04/getting-started-with-azure-premium-storage-and-datastax-enterprise-dse Secondary indexes are built for convenience not performance. http://www.datastax.com/resources/data-modeling What's your compaction strategy? Your nodes have to come up in order for them to start compacting. On Jul 13, 2015 1:11 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Hi, Looks like that is my primary problem - the sstable count for the daily_challenges column family is 5k. Azure had scheduled maintenance window on Sat. All the VMs got rebooted one by one - including the current cassandra one - and it's taking forever to bring cassandra back up online. Is there any way I can re-organize my existing data? so that I can bring down that count? I don't want to lose that data. If possible, can I do that while cassandra is down? As I mentioned, it's taking forever to get the service up - it's stuck in reading those 5k sstable (+ another 5k of corresponding secondary index) files. :( Oh, did I mention I'm new to cassandra? Thanks, Kunal Kunal On 11 July 2015 at 03:29, Sebastian Estevez sebastian.este...@datastax.com wrote: #1 There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. 460 is high, I like to keep my partitions under 100mb when possible. I've seen worse though. The fix is to add something else (maybe month or week or something) into your partition key: PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id) #2 looks like your jam version is 3 per your env.sh so you're probably okay to copy the env.sh over from the C* 3.0 link I shared once you uncomment and tweak the MAX_HEAP. If there's something wrong your node won't come up. tail your logs. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: And here is my cassandra-env.sh https://gist.github.com/kunalg/2c092cb2450c62be9a20 Kunal On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: From jhat output, top 10 entries for Instance Count for All Classes (excluding platform) shows: 2088223 instances of class org.apache.cassandra.db.BufferCell 1983245 instances of class org.apache.cassandra.db.composites.CompoundSparseCellName 1885974 instances of class org.apache.cassandra.db.composites.CompoundDenseCellName 63 instances of class org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 503687 instances of class org.apache.cassandra.db.BufferDeletedCell 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier 101800 instances of class org.apache.cassandra.utils.concurrent.Ref 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State 90704 instances of class org.apache.cassandra.utils.concurrent.Ref$GlobalState 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey At the bottom of the page, it shows: Total of 8739510 instances occupying 193607512 bytes. JFYI. Kunal On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks for quick reply. 1. I don't know what are the thresholds that I should look for. So, to save this back-and-forth, I'm attaching the cfstats output for the keyspace. There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. Can that be a problem? Here is the CQL schema for the daily_challenges column family: CREATE TABLE app_10001.daily_challenges ( segment_type text, date timestamp, user_id int, sess_id text, data text, deleted boolean, PRIMARY KEY (segment_type, date, user_id, sess_id) ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction =
Re: Cassandra OOM on joining existing ring
We faced similar issue where we had 60k sstables due to coldness bug in 2.0.3. We solved it by following Datastax recommendation for Production at http://docs.datastax.com/en/cassandra/1.2/cassandra/install/installRecommendSettings.html : Step 1 : Add the following line to /etc/sysctl.conf : vm.max_map_count = 131072 Step 2: To make the changes take effect, reboot the server or run the following command: $ sudo sysctl -p Step 3(optional): To confirm the limits are applied to the Cassandra process, run the following command where pid is the process ID of the currently running Cassandra process: $ cat /proc/pid/limits You can try above settings and share your results.. Thanks Anuj Sent from Yahoo Mail on Android From:Sebastian Estevez sebastian.este...@datastax.com Date:Mon, 13 Jul, 2015 at 7:02 pm Subject:Re: Cassandra OOM on joining existing ring Are you on the azure premium storage? http://www.datastax.com/2015/04/getting-started-with-azure-premium-storage-and-datastax-enterprise-dse Secondary indexes are built for convenience not performance. http://www.datastax.com/resources/data-modeling What's your compaction strategy? Your nodes have to come up in order for them to start compacting. On Jul 13, 2015 1:11 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Hi, Looks like that is my primary problem - the sstable count for the daily_challenges column family is 5k. Azure had scheduled maintenance window on Sat. All the VMs got rebooted one by one - including the current cassandra one - and it's taking forever to bring cassandra back up online. Is there any way I can re-organize my existing data? so that I can bring down that count? I don't want to lose that data. If possible, can I do that while cassandra is down? As I mentioned, it's taking forever to get the service up - it's stuck in reading those 5k sstable (+ another 5k of corresponding secondary index) files. :( Oh, did I mention I'm new to cassandra? Thanks, Kunal Kunal On 11 July 2015 at 03:29, Sebastian Estevez sebastian.este...@datastax.com wrote: #1 There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. 460 is high, I like to keep my partitions under 100mb when possible. I've seen worse though. The fix is to add something else (maybe month or week or something) into your partition key: PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id) #2 looks like your jam version is 3 per your env.sh so you're probably okay to copy the env.sh over from the C* 3.0 link I shared once you uncomment and tweak the MAX_HEAP. If there's something wrong your node won't come up. tail your logs. All the best,  Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com       DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: And here is my cassandra-env.sh https://gist.github.com/kunalg/2c092cb2450c62be9a20 Kunal On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: From jhat output, top 10 entries for Instance Count for All Classes (excluding platform) shows: 2088223 instances of class org.apache.cassandra.db.BufferCell 1983245 instances of class org.apache.cassandra.db.composites.CompoundSparseCellName 1885974 instances of class org.apache.cassandra.db.composites.CompoundDenseCellName 63 instances of class org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 503687 instances of class org.apache.cassandra.db.BufferDeletedCell 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier 101800 instances of class org.apache.cassandra.utils.concurrent.Ref 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State 90704 instances of class org.apache.cassandra.utils.concurrent.Ref$GlobalState 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey At the bottom of the page, it shows: Total of 8739510 instances occupying 193607512 bytes. JFYI. Kunal On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks for quick reply. 1. I don't know what are the thresholds that I should look for. So, to save this back-and-forth, I'm attaching the cfstats output for the keyspace. There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted
Re: Cassandra OOM on joining existing ring
Hi, Looks like that is my primary problem - the sstable count for the daily_challenges column family is 5k. Azure had scheduled maintenance window on Sat. All the VMs got rebooted one by one - including the current cassandra one - and it's taking forever to bring cassandra back up online. Is there any way I can re-organize my existing data? so that I can bring down that count? I don't want to lose that data. If possible, can I do that while cassandra is down? As I mentioned, it's taking forever to get the service up - it's stuck in reading those 5k sstable (+ another 5k of corresponding secondary index) files. :( Oh, did I mention I'm new to cassandra? Thanks, Kunal Kunal On 11 July 2015 at 03:29, Sebastian Estevez sebastian.este...@datastax.com wrote: #1 There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. 460 is high, I like to keep my partitions under 100mb when possible. I've seen worse though. The fix is to add something else (maybe month or week or something) into your partition key: PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id) #2 looks like your jam version is 3 per your env.sh so you're probably okay to copy the env.sh over from the C* 3.0 link I shared once you uncomment and tweak the MAX_HEAP. If there's something wrong your node won't come up. tail your logs. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: And here is my cassandra-env.sh https://gist.github.com/kunalg/2c092cb2450c62be9a20 Kunal On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: From jhat output, top 10 entries for Instance Count for All Classes (excluding platform) shows: 2088223 instances of class org.apache.cassandra.db.BufferCell 1983245 instances of class org.apache.cassandra.db.composites.CompoundSparseCellName 1885974 instances of class org.apache.cassandra.db.composites.CompoundDenseCellName 63 instances of class org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 503687 instances of class org.apache.cassandra.db.BufferDeletedCell 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier 101800 instances of class org.apache.cassandra.utils.concurrent.Ref 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State 90704 instances of class org.apache.cassandra.utils.concurrent.Ref$GlobalState 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey At the bottom of the page, it shows: Total of 8739510 instances occupying 193607512 bytes. JFYI. Kunal On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks for quick reply. 1. I don't know what are the thresholds that I should look for. So, to save this back-and-forth, I'm attaching the cfstats output for the keyspace. There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. Can that be a problem? Here is the CQL schema for the daily_challenges column family: CREATE TABLE app_10001.daily_challenges ( segment_type text, date timestamp, user_id int, sess_id text, data text, deleted boolean, PRIMARY KEY (segment_type, date, user_id, sess_id) ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128
Re: Cassandra OOM on joining existing ring
Attaching the stack dump captured from the last OOM. Kunal On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Forgot to mention: the data size is not that big - it's barely 10GB in all. Kunal On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Hi, I have a 2 node setup on Azure (east us region) running Ubuntu server 14.04LTS. Both nodes have 8GB RAM. One of the nodes (seed node) died with OOM - so, I am trying to add a replacement node with same configuration. The problem is this new node also keeps dying with OOM - I've restarted the cassandra service like 8-10 times hoping that it would finish the replication. But it didn't help. The one node that is still up is happily chugging along. All nodes have similar configuration - with libjna installed. Cassandra is installed from datastax's debian repo - pkg: dsc21 version 2.1.7. I started off with the default configuration - i.e. the default cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM = 2GB) But, that didn't help. So, I then tried to increase the heap to 4GB manually and restarted. It still keeps crashing. Any clue as to why it's happening? Thanks, Kunal ERROR [SharedPool-Worker-6] 2015-07-10 05:12:16,862 JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) ~[na:1.8.0_45] at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_45] at org.apache.cassandra.utils.memory.SlabAllocator.getRegion(SlabAllocator.java:137) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.memory.SlabAllocator.allocate(SlabAllocator.java:97) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.memory.ContextAllocator.allocate(ContextAllocator.java:57) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.memory.ContextAllocator.clone(ContextAllocator.java:47) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.memory.MemtableBufferAllocator.clone(MemtableBufferAllocator.java:61) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.Memtable.put(Memtable.java:192) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1212) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:131) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.index.SecondaryIndexManager$StandardUpdater.insert(SecondaryIndexManager.java:791) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:444) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:418) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.btree.BTree.build(BTree.java:116) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.utils.btree.BTree.update(BTree.java:177) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.Memtable.put(Memtable.java:210) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1212) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:389) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:352) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.7.jar:2.1.7] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_45] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.7.jar:2.1.7] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.7.jar:2.1.7] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] ERROR [CompactionExecutor:3] 2015-07-10 05:12:16,862 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:3,1,main] java.lang.OutOfMemoryError: Java heap space at java.util.ArrayDeque.doubleCapacity(ArrayDeque.java:157) ~[na:1.8.0_45] at
Re: Cassandra OOM on joining existing ring
Forgot to mention: the data size is not that big - it's barely 10GB in all. Kunal On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Hi, I have a 2 node setup on Azure (east us region) running Ubuntu server 14.04LTS. Both nodes have 8GB RAM. One of the nodes (seed node) died with OOM - so, I am trying to add a replacement node with same configuration. The problem is this new node also keeps dying with OOM - I've restarted the cassandra service like 8-10 times hoping that it would finish the replication. But it didn't help. The one node that is still up is happily chugging along. All nodes have similar configuration - with libjna installed. Cassandra is installed from datastax's debian repo - pkg: dsc21 version 2.1.7. I started off with the default configuration - i.e. the default cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM = 2GB) But, that didn't help. So, I then tried to increase the heap to 4GB manually and restarted. It still keeps crashing. Any clue as to why it's happening? Thanks, Kunal
Re: Cassandra OOM on joining existing ring
I'm new to cassandra How do I find those out? - mainly, the partition params that you asked for. Others, I think I can figure out. We don't have any large objects/blobs in the column values - it's all textual, date-time, numeric and uuid data. We use cassandra to primarily store segmentation data - with segment type as partition key. That is again divided into two separate column families; but they have similar structure. Columns per row can be fairly large - each segment type as the row key and associated user ids and timestamp as column value. Thanks, Kunal On 10 July 2015 at 16:36, Jack Krupansky jack.krupan...@gmail.com wrote: What does your data and data model look like - partition size, rows per partition, number of columns per row, any large values/blobs in column values? You could run fine on an 8GB system, but only if your rows and partitions are reasonably small. Any large partitions could blow you away. -- Jack Krupansky On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Attaching the stack dump captured from the last OOM. Kunal On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Forgot to mention: the data size is not that big - it's barely 10GB in all. Kunal On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Hi, I have a 2 node setup on Azure (east us region) running Ubuntu server 14.04LTS. Both nodes have 8GB RAM. One of the nodes (seed node) died with OOM - so, I am trying to add a replacement node with same configuration. The problem is this new node also keeps dying with OOM - I've restarted the cassandra service like 8-10 times hoping that it would finish the replication. But it didn't help. The one node that is still up is happily chugging along. All nodes have similar configuration - with libjna installed. Cassandra is installed from datastax's debian repo - pkg: dsc21 version 2.1.7. I started off with the default configuration - i.e. the default cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM = 2GB) But, that didn't help. So, I then tried to increase the heap to 4GB manually and restarted. It still keeps crashing. Any clue as to why it's happening? Thanks, Kunal
Re: Cassandra OOM on joining existing ring
You, and only you, are responsible for knowing your data and data model. If columns per row or rows per partition can be large, then an 8GB system is probably too small. But the real issue is that you need to keep your partition size from getting too large. Generally, an 8GB system is okay, but only for reasonably-sized partitions, like under 10MB. -- Jack Krupansky On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: I'm new to cassandra How do I find those out? - mainly, the partition params that you asked for. Others, I think I can figure out. We don't have any large objects/blobs in the column values - it's all textual, date-time, numeric and uuid data. We use cassandra to primarily store segmentation data - with segment type as partition key. That is again divided into two separate column families; but they have similar structure. Columns per row can be fairly large - each segment type as the row key and associated user ids and timestamp as column value. Thanks, Kunal On 10 July 2015 at 16:36, Jack Krupansky jack.krupan...@gmail.com wrote: What does your data and data model look like - partition size, rows per partition, number of columns per row, any large values/blobs in column values? You could run fine on an 8GB system, but only if your rows and partitions are reasonably small. Any large partitions could blow you away. -- Jack Krupansky On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Attaching the stack dump captured from the last OOM. Kunal On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Forgot to mention: the data size is not that big - it's barely 10GB in all. Kunal On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Hi, I have a 2 node setup on Azure (east us region) running Ubuntu server 14.04LTS. Both nodes have 8GB RAM. One of the nodes (seed node) died with OOM - so, I am trying to add a replacement node with same configuration. The problem is this new node also keeps dying with OOM - I've restarted the cassandra service like 8-10 times hoping that it would finish the replication. But it didn't help. The one node that is still up is happily chugging along. All nodes have similar configuration - with libjna installed. Cassandra is installed from datastax's debian repo - pkg: dsc21 version 2.1.7. I started off with the default configuration - i.e. the default cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM = 2GB) But, that didn't help. So, I then tried to increase the heap to 4GB manually and restarted. It still keeps crashing. Any clue as to why it's happening? Thanks, Kunal
Re: Cassandra OOM on joining existing ring
What does your data and data model look like - partition size, rows per partition, number of columns per row, any large values/blobs in column values? You could run fine on an 8GB system, but only if your rows and partitions are reasonably small. Any large partitions could blow you away. -- Jack Krupansky On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Attaching the stack dump captured from the last OOM. Kunal On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Forgot to mention: the data size is not that big - it's barely 10GB in all. Kunal On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Hi, I have a 2 node setup on Azure (east us region) running Ubuntu server 14.04LTS. Both nodes have 8GB RAM. One of the nodes (seed node) died with OOM - so, I am trying to add a replacement node with same configuration. The problem is this new node also keeps dying with OOM - I've restarted the cassandra service like 8-10 times hoping that it would finish the replication. But it didn't help. The one node that is still up is happily chugging along. All nodes have similar configuration - with libjna installed. Cassandra is installed from datastax's debian repo - pkg: dsc21 version 2.1.7. I started off with the default configuration - i.e. the default cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM = 2GB) But, that didn't help. So, I then tried to increase the heap to 4GB manually and restarted. It still keeps crashing. Any clue as to why it's happening? Thanks, Kunal
Re: Cassandra OOM on joining existing ring
Thanks for quick reply. 1. I don't know what are the thresholds that I should look for. So, to save this back-and-forth, I'm attaching the cfstats output for the keyspace. There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. Can that be a problem? Here is the CQL schema for the daily_challenges column family: CREATE TABLE app_10001.daily_challenges ( segment_type text, date timestamp, user_id int, sess_id text, data text, deleted boolean, PRIMARY KEY (segment_type, date, user_id, sess_id) ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted); 2. I don't know - how do I check? As I mentioned, I just installed the dsc21 update from datastax's debian repo (ver 2.1.7). Really appreciate your help. Thanks, Kunal On 10 July 2015 at 23:33, Sebastian Estevez sebastian.este...@datastax.com wrote: 1. You want to look at # of sstables in cfhistograms or in cfstats look at: Compacted partition maximum bytes Maximum live cells per slice 2) No, here's the env.sh from 3.0 which should work with some tweaks: https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh You'll at least have to modify the jamm version to what's in yours. I think it's 2.5 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks, Sebastian. Couple of questions (I'm really new to cassandra): 1. How do I interpret the output of 'nodetool cfstats' to figure out the issues? Any documentation pointer on that would be helpful. 2. I'm primarily a python/c developer - so, totally clueless about JVM environment. So, please bare with me as I would need a lot of hand-holding. Should I just copy+paste the settings you gave and try to restart the failing cassandra server? Thanks, Kunal On 10 July 2015 at 22:35, Sebastian Estevez sebastian.este...@datastax.com wrote: #1 You need more information. a) Take a look at your .hprof file (memory heap from the OOM) with an introspection tool like jhat or visualvm or java flight recorder and see what is using up your RAM. b) How big are your large rows (use nodetool cfstats on each node). If your data model is bad, you are going to have to re-design it no matter what. #2 As a possible workaround try using the G1GC allocator with the settings from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do *not* set the newgen size for G1 sets it dynamically: # min and max heap sizes should be set to the same value to avoid # stop-the-world GC pauses during resize, and so that we can lock the # heap in memory on startup to prevent any of it from being swapped # out. JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE} JVM_OPTS=$JVM_OPTS -Xmx${MAX_HEAP_SIZE} # Per-thread stack size. JVM_OPTS=$JVM_OPTS -Xss256k # Use the Hotspot garbage-first collector. JVM_OPTS=$JVM_OPTS -XX:+UseG1GC # Have the JVM do less remembered set work during STW, instead # preferring concurrent GC. Reduces p99.9 latency. JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5 # The JVM maximum is 8 PGC threads and 1/4
Re: Cassandra OOM on joining existing ring
And here is my cassandra-env.sh https://gist.github.com/kunalg/2c092cb2450c62be9a20 Kunal On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: From jhat output, top 10 entries for Instance Count for All Classes (excluding platform) shows: 2088223 instances of class org.apache.cassandra.db.BufferCell 1983245 instances of class org.apache.cassandra.db.composites.CompoundSparseCellName 1885974 instances of class org.apache.cassandra.db.composites.CompoundDenseCellName 63 instances of class org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 503687 instances of class org.apache.cassandra.db.BufferDeletedCell 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier 101800 instances of class org.apache.cassandra.utils.concurrent.Ref 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State 90704 instances of class org.apache.cassandra.utils.concurrent.Ref$GlobalState 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey At the bottom of the page, it shows: Total of 8739510 instances occupying 193607512 bytes. JFYI. Kunal On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks for quick reply. 1. I don't know what are the thresholds that I should look for. So, to save this back-and-forth, I'm attaching the cfstats output for the keyspace. There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. Can that be a problem? Here is the CQL schema for the daily_challenges column family: CREATE TABLE app_10001.daily_challenges ( segment_type text, date timestamp, user_id int, sess_id text, data text, deleted boolean, PRIMARY KEY (segment_type, date, user_id, sess_id) ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted); 2. I don't know - how do I check? As I mentioned, I just installed the dsc21 update from datastax's debian repo (ver 2.1.7). Really appreciate your help. Thanks, Kunal On 10 July 2015 at 23:33, Sebastian Estevez sebastian.este...@datastax.com wrote: 1. You want to look at # of sstables in cfhistograms or in cfstats look at: Compacted partition maximum bytes Maximum live cells per slice 2) No, here's the env.sh from 3.0 which should work with some tweaks: https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh You'll at least have to modify the jamm version to what's in yours. I think it's 2.5 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks, Sebastian. Couple of questions (I'm really new to cassandra): 1. How do I interpret the output of 'nodetool cfstats' to figure out the issues? Any documentation pointer on that would be helpful. 2. I'm primarily a python/c developer - so, totally clueless about JVM environment. So, please bare with me as I would need a lot of hand-holding. Should I just copy+paste the settings you gave and try to restart the failing cassandra server? Thanks, Kunal On 10 July 2015 at 22:35, Sebastian Estevez sebastian.este...@datastax.com wrote: #1 You need more information. a) Take a look at your .hprof file (memory heap from the OOM)
Re: Cassandra OOM on joining existing ring
Thanks, Sebastian. Couple of questions (I'm really new to cassandra): 1. How do I interpret the output of 'nodetool cfstats' to figure out the issues? Any documentation pointer on that would be helpful. 2. I'm primarily a python/c developer - so, totally clueless about JVM environment. So, please bare with me as I would need a lot of hand-holding. Should I just copy+paste the settings you gave and try to restart the failing cassandra server? Thanks, Kunal On 10 July 2015 at 22:35, Sebastian Estevez sebastian.este...@datastax.com wrote: #1 You need more information. a) Take a look at your .hprof file (memory heap from the OOM) with an introspection tool like jhat or visualvm or java flight recorder and see what is using up your RAM. b) How big are your large rows (use nodetool cfstats on each node). If your data model is bad, you are going to have to re-design it no matter what. #2 As a possible workaround try using the G1GC allocator with the settings from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do *not* set the newgen size for G1 sets it dynamically: # min and max heap sizes should be set to the same value to avoid # stop-the-world GC pauses during resize, and so that we can lock the # heap in memory on startup to prevent any of it from being swapped # out. JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE} JVM_OPTS=$JVM_OPTS -Xmx${MAX_HEAP_SIZE} # Per-thread stack size. JVM_OPTS=$JVM_OPTS -Xss256k # Use the Hotspot garbage-first collector. JVM_OPTS=$JVM_OPTS -XX:+UseG1GC # Have the JVM do less remembered set work during STW, instead # preferring concurrent GC. Reduces p99.9 latency. JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5 # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC. # Machines with 10 cores may need additional threads. # Increase to = full cores (do not count HT cores). #JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=16 #JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=16 # Main G1GC tunable: lowering the pause target will lower throughput and vise versa. # 200ms is the JVM default and lowest viable setting # 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml. JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=500 # Do reference processing in parallel GC. JVM_OPTS=$JVM_OPTS -XX:+ParallelRefProcEnabled # This may help eliminate STW. # The default in Hotspot 8u40 is 40%. #JVM_OPTS=$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25 # For workloads that do large allocations, increasing the region # size may make things more efficient. Otherwise, let the JVM # set this automatically. #JVM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32m # Make sure all memory is faulted and zeroed on startup. # This helps prevent soft faults in containers and makes # transparent hugepage allocation more effective. JVM_OPTS=$JVM_OPTS -XX:+AlwaysPreTouch # Biased locking does not benefit Cassandra. JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # Larger interned string table, for gossip's benefit (CASSANDRA-6410) JVM_OPTS=$JVM_OPTS -XX:StringTableSize=103 # Enable thread-local allocation blocks and allow the JVM to automatically # resize them at runtime. JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB # http://www.evanjones.ca/jvm-mmap-pause.html JVM_OPTS=$JVM_OPTS -XX:+PerfDisableSharedMem All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: I upgraded my instance from 8GB to a 14GB one. Allocated 8GB to jvm heap in cassandra-env.sh. And now, it crashes even faster with an OOM.. Earlier, with 4GB heap, I could go upto ~90% replication completion (as reported by nodetool netstats); now, with 8GB heap, I cannot even get there. I've already restarted cassandra service 4 times with 8GB heap. No clue what's going on.. :( Kunal On 10 July 2015 at 17:45, Jack Krupansky jack.krupan...@gmail.com wrote: You, and only you, are responsible for knowing your
Re: Cassandra OOM on joining existing ring
1. You want to look at # of sstables in cfhistograms or in cfstats look at: Compacted partition maximum bytes Maximum live cells per slice 2) No, here's the env.sh from 3.0 which should work with some tweaks: https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh You'll at least have to modify the jamm version to what's in yours. I think it's 2.5 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks, Sebastian. Couple of questions (I'm really new to cassandra): 1. How do I interpret the output of 'nodetool cfstats' to figure out the issues? Any documentation pointer on that would be helpful. 2. I'm primarily a python/c developer - so, totally clueless about JVM environment. So, please bare with me as I would need a lot of hand-holding. Should I just copy+paste the settings you gave and try to restart the failing cassandra server? Thanks, Kunal On 10 July 2015 at 22:35, Sebastian Estevez sebastian.este...@datastax.com wrote: #1 You need more information. a) Take a look at your .hprof file (memory heap from the OOM) with an introspection tool like jhat or visualvm or java flight recorder and see what is using up your RAM. b) How big are your large rows (use nodetool cfstats on each node). If your data model is bad, you are going to have to re-design it no matter what. #2 As a possible workaround try using the G1GC allocator with the settings from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do *not* set the newgen size for G1 sets it dynamically: # min and max heap sizes should be set to the same value to avoid # stop-the-world GC pauses during resize, and so that we can lock the # heap in memory on startup to prevent any of it from being swapped # out. JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE} JVM_OPTS=$JVM_OPTS -Xmx${MAX_HEAP_SIZE} # Per-thread stack size. JVM_OPTS=$JVM_OPTS -Xss256k # Use the Hotspot garbage-first collector. JVM_OPTS=$JVM_OPTS -XX:+UseG1GC # Have the JVM do less remembered set work during STW, instead # preferring concurrent GC. Reduces p99.9 latency. JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5 # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC. # Machines with 10 cores may need additional threads. # Increase to = full cores (do not count HT cores). #JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=16 #JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=16 # Main G1GC tunable: lowering the pause target will lower throughput and vise versa. # 200ms is the JVM default and lowest viable setting # 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml. JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=500 # Do reference processing in parallel GC. JVM_OPTS=$JVM_OPTS -XX:+ParallelRefProcEnabled # This may help eliminate STW. # The default in Hotspot 8u40 is 40%. #JVM_OPTS=$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25 # For workloads that do large allocations, increasing the region # size may make things more efficient. Otherwise, let the JVM # set this automatically. #JVM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32m # Make sure all memory is faulted and zeroed on startup. # This helps prevent soft faults in containers and makes # transparent hugepage allocation more effective. JVM_OPTS=$JVM_OPTS -XX:+AlwaysPreTouch # Biased locking does not benefit Cassandra. JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # Larger interned string table, for gossip's benefit (CASSANDRA-6410) JVM_OPTS=$JVM_OPTS -XX:StringTableSize=103 # Enable thread-local allocation blocks and allow the JVM to automatically # resize them at runtime. JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB # http://www.evanjones.ca/jvm-mmap-pause.html JVM_OPTS=$JVM_OPTS -XX:+PerfDisableSharedMem All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 |
Re: Cassandra OOM on joining existing ring
From jhat output, top 10 entries for Instance Count for All Classes (excluding platform) shows: 2088223 instances of class org.apache.cassandra.db.BufferCell 1983245 instances of class org.apache.cassandra.db.composites.CompoundSparseCellName 1885974 instances of class org.apache.cassandra.db.composites.CompoundDenseCellName 63 instances of class org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 503687 instances of class org.apache.cassandra.db.BufferDeletedCell 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier 101800 instances of class org.apache.cassandra.utils.concurrent.Ref 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State 90704 instances of class org.apache.cassandra.utils.concurrent.Ref$GlobalState 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey At the bottom of the page, it shows: Total of 8739510 instances occupying 193607512 bytes. JFYI. Kunal On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks for quick reply. 1. I don't know what are the thresholds that I should look for. So, to save this back-and-forth, I'm attaching the cfstats output for the keyspace. There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. Can that be a problem? Here is the CQL schema for the daily_challenges column family: CREATE TABLE app_10001.daily_challenges ( segment_type text, date timestamp, user_id int, sess_id text, data text, deleted boolean, PRIMARY KEY (segment_type, date, user_id, sess_id) ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted); 2. I don't know - how do I check? As I mentioned, I just installed the dsc21 update from datastax's debian repo (ver 2.1.7). Really appreciate your help. Thanks, Kunal On 10 July 2015 at 23:33, Sebastian Estevez sebastian.este...@datastax.com wrote: 1. You want to look at # of sstables in cfhistograms or in cfstats look at: Compacted partition maximum bytes Maximum live cells per slice 2) No, here's the env.sh from 3.0 which should work with some tweaks: https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh You'll at least have to modify the jamm version to what's in yours. I think it's 2.5 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks, Sebastian. Couple of questions (I'm really new to cassandra): 1. How do I interpret the output of 'nodetool cfstats' to figure out the issues? Any documentation pointer on that would be helpful. 2. I'm primarily a python/c developer - so, totally clueless about JVM environment. So, please bare with me as I would need a lot of hand-holding. Should I just copy+paste the settings you gave and try to restart the failing cassandra server? Thanks, Kunal On 10 July 2015 at 22:35, Sebastian Estevez sebastian.este...@datastax.com wrote: #1 You need more information. a) Take a look at your .hprof file (memory heap from the OOM) with an introspection tool like jhat or visualvm or java flight recorder and see what is using up your RAM. b) How big are your large rows (use nodetool cfstats on each node). If your data
Re: Cassandra OOM on joining existing ring
I upgraded my instance from 8GB to a 14GB one. Allocated 8GB to jvm heap in cassandra-env.sh. And now, it crashes even faster with an OOM.. Earlier, with 4GB heap, I could go upto ~90% replication completion (as reported by nodetool netstats); now, with 8GB heap, I cannot even get there. I've already restarted cassandra service 4 times with 8GB heap. No clue what's going on.. :( Kunal On 10 July 2015 at 17:45, Jack Krupansky jack.krupan...@gmail.com wrote: You, and only you, are responsible for knowing your data and data model. If columns per row or rows per partition can be large, then an 8GB system is probably too small. But the real issue is that you need to keep your partition size from getting too large. Generally, an 8GB system is okay, but only for reasonably-sized partitions, like under 10MB. -- Jack Krupansky On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: I'm new to cassandra How do I find those out? - mainly, the partition params that you asked for. Others, I think I can figure out. We don't have any large objects/blobs in the column values - it's all textual, date-time, numeric and uuid data. We use cassandra to primarily store segmentation data - with segment type as partition key. That is again divided into two separate column families; but they have similar structure. Columns per row can be fairly large - each segment type as the row key and associated user ids and timestamp as column value. Thanks, Kunal On 10 July 2015 at 16:36, Jack Krupansky jack.krupan...@gmail.com wrote: What does your data and data model look like - partition size, rows per partition, number of columns per row, any large values/blobs in column values? You could run fine on an 8GB system, but only if your rows and partitions are reasonably small. Any large partitions could blow you away. -- Jack Krupansky On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Attaching the stack dump captured from the last OOM. Kunal On 10 July 2015 at 13:32, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Forgot to mention: the data size is not that big - it's barely 10GB in all. Kunal On 10 July 2015 at 13:29, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Hi, I have a 2 node setup on Azure (east us region) running Ubuntu server 14.04LTS. Both nodes have 8GB RAM. One of the nodes (seed node) died with OOM - so, I am trying to add a replacement node with same configuration. The problem is this new node also keeps dying with OOM - I've restarted the cassandra service like 8-10 times hoping that it would finish the replication. But it didn't help. The one node that is still up is happily chugging along. All nodes have similar configuration - with libjna installed. Cassandra is installed from datastax's debian repo - pkg: dsc21 version 2.1.7. I started off with the default configuration - i.e. the default cassandra-env.sh - which calculates the heap size automatically (1/4 * RAM = 2GB) But, that didn't help. So, I then tried to increase the heap to 4GB manually and restarted. It still keeps crashing. Any clue as to why it's happening? Thanks, Kunal
Re: Cassandra OOM on joining existing ring
#1 You need more information. a) Take a look at your .hprof file (memory heap from the OOM) with an introspection tool like jhat or visualvm or java flight recorder and see what is using up your RAM. b) How big are your large rows (use nodetool cfstats on each node). If your data model is bad, you are going to have to re-design it no matter what. #2 As a possible workaround try using the G1GC allocator with the settings from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr G1GC is much simpler than CMS and almost as good as a finely tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do *not* set the newgen size for G1 sets it dynamically: # min and max heap sizes should be set to the same value to avoid # stop-the-world GC pauses during resize, and so that we can lock the # heap in memory on startup to prevent any of it from being swapped # out. JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE} JVM_OPTS=$JVM_OPTS -Xmx${MAX_HEAP_SIZE} # Per-thread stack size. JVM_OPTS=$JVM_OPTS -Xss256k # Use the Hotspot garbage-first collector. JVM_OPTS=$JVM_OPTS -XX:+UseG1GC # Have the JVM do less remembered set work during STW, instead # preferring concurrent GC. Reduces p99.9 latency. JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5 # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC. # Machines with 10 cores may need additional threads. # Increase to = full cores (do not count HT cores). #JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=16 #JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=16 # Main G1GC tunable: lowering the pause target will lower throughput and vise versa. # 200ms is the JVM default and lowest viable setting # 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml. JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=500 # Do reference processing in parallel GC. JVM_OPTS=$JVM_OPTS -XX:+ParallelRefProcEnabled # This may help eliminate STW. # The default in Hotspot 8u40 is 40%. #JVM_OPTS=$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25 # For workloads that do large allocations, increasing the region # size may make things more efficient. Otherwise, let the JVM # set this automatically. #JVM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32m # Make sure all memory is faulted and zeroed on startup. # This helps prevent soft faults in containers and makes # transparent hugepage allocation more effective. JVM_OPTS=$JVM_OPTS -XX:+AlwaysPreTouch # Biased locking does not benefit Cassandra. JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # Larger interned string table, for gossip's benefit (CASSANDRA-6410) JVM_OPTS=$JVM_OPTS -XX:StringTableSize=103 # Enable thread-local allocation blocks and allow the JVM to automatically # resize them at runtime. JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB # http://www.evanjones.ca/jvm-mmap-pause.html JVM_OPTS=$JVM_OPTS -XX:+PerfDisableSharedMem All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: I upgraded my instance from 8GB to a 14GB one. Allocated 8GB to jvm heap in cassandra-env.sh. And now, it crashes even faster with an OOM.. Earlier, with 4GB heap, I could go upto ~90% replication completion (as reported by nodetool netstats); now, with 8GB heap, I cannot even get there. I've already restarted cassandra service 4 times with 8GB heap. No clue what's going on.. :( Kunal On 10 July 2015 at 17:45, Jack Krupansky jack.krupan...@gmail.com wrote: You, and only you, are responsible for knowing your data and data model. If columns per row or rows per partition can be large, then an 8GB system is probably too small. But the real issue is that you need to keep your partition size from getting too large. Generally, an 8GB system is okay, but only for reasonably-sized partitions, like under 10MB. -- Jack Krupansky On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: I'm new to cassandra How do I find those out? - mainly, the partition params that you asked for. Others, I think I can figure out. We don't have any large
Re: Cassandra OOM on joining existing ring
#1 There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. 460 is high, I like to keep my partitions under 100mb when possible. I've seen worse though. The fix is to add something else (maybe month or week or something) into your partition key: PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id) #2 looks like your jam version is 3 per your env.sh so you're probably okay to copy the env.sh over from the C* 3.0 link I shared once you uncomment and tweak the MAX_HEAP. If there's something wrong your node won't come up. tail your logs. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: And here is my cassandra-env.sh https://gist.github.com/kunalg/2c092cb2450c62be9a20 Kunal On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: From jhat output, top 10 entries for Instance Count for All Classes (excluding platform) shows: 2088223 instances of class org.apache.cassandra.db.BufferCell 1983245 instances of class org.apache.cassandra.db.composites.CompoundSparseCellName 1885974 instances of class org.apache.cassandra.db.composites.CompoundDenseCellName 63 instances of class org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 503687 instances of class org.apache.cassandra.db.BufferDeletedCell 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier 101800 instances of class org.apache.cassandra.utils.concurrent.Ref 101800 instances of class org.apache.cassandra.utils.concurrent.Ref$State 90704 instances of class org.apache.cassandra.utils.concurrent.Ref$GlobalState 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey At the bottom of the page, it shows: Total of 8739510 instances occupying 193607512 bytes. JFYI. Kunal On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com wrote: Thanks for quick reply. 1. I don't know what are the thresholds that I should look for. So, to save this back-and-forth, I'm attaching the cfstats output for the keyspace. There is one table - daily_challenges - which shows compacted partition max bytes as ~460M and another one - daily_guest_logins - which shows compacted partition max bytes as ~36M. Can that be a problem? Here is the CQL schema for the daily_challenges column family: CREATE TABLE app_10001.daily_challenges ( segment_type text, date timestamp, user_id int, sess_id text, data text, deleted boolean, PRIMARY KEY (segment_type, date, user_id, sess_id) ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted); 2. I don't know - how do I check? As I mentioned, I just installed the dsc21 update from datastax's debian repo (ver 2.1.7). Really appreciate your help. Thanks, Kunal On 10 July 2015 at 23:33, Sebastian Estevez sebastian.este...@datastax.com wrote: 1. You want to look at # of sstables in cfhistograms or in cfstats look at: Compacted partition maximum bytes Maximum live cells per slice 2) No, here's the env.sh from 3.0 which should work with some tweaks: https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh You'll at least have to modify the jamm version to what's in
Re: Cassandra OOM, many deletedColumn
For JVM Heap it is 2G Try 4G and gc_grace = 1800 Realised that I did not provide a warning about the implication this has for node tool repair. If you are doing deleted on the CF you need to run nodetool repair every gc_grace seconds. In this case I think you main problem was not enough JVM heap. Try setting it to 4G and see how that goes. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 12/03/2013, at 8:17 PM, 金剑 jinjia...@gmail.com wrote: Thanks for you reply. we will try both of your recommentation. The OS memory is 8G, For JVM Heap it is 2G, DeletedColumn used 1.4G which are rooted from readStage thread. Do you think we need increase the size of JVM Heap? Configuration for the index columnFamily is create column family purge with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'BytesType' and key_validation_class = 'UTF8Type' and read_repair_chance = 1.0 and gc_grace = 1800 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'; Best Regards! Jian Jin 2013/3/9 aaron morton aa...@thelastpickle.com You need to provide some details of the machine and the JVM configuration. But lets say you need to have 4Gb to 8GB for the JVM heap. If you have many deleted columns I would say you have a *lot* of garbage in each row. Consider reducing the gc_grace seconds so the columns are purged more frequently, not however that columns are only purged when all fragments of the row are part of the minor compaction. If you have a mixed write / delete work load consider using the Levelled compaction strategy http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/03/2013, at 10:37 PM, Jason Wee peich...@gmail.com wrote: hmm.. did you managed to take a look using nodetool tpstats? That may give you indication further.. Jason On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote: Hi, My version is 1.1.7 Our use case is : we have a index columnfamily to record how many resource is stored for a user. The number might vary from tens to millions. We provide a feature to let user to delete resource according prefix. we found some cassandra will OOM after some period. The cluster is a kind of cross-datacenter ring. 1. Exception in cassandra log: ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5810,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5819,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-36,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at
Re: Cassandra OOM, many deletedColumn
Thanks for you reply. we will try both of your recommentation. The OS memory is 8G, For JVM Heap it is 2G, DeletedColumn used 1.4G which are rooted from readStage thread. Do you think we need increase the size of JVM Heap? Configuration for the index columnFamily is create column family purge with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'BytesType' and key_validation_class = 'UTF8Type' and read_repair_chance = 1.0 and gc_grace = 1800 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'; Best Regards! Jian Jin 2013/3/9 aaron morton aa...@thelastpickle.com You need to provide some details of the machine and the JVM configuration. But lets say you need to have 4Gb to 8GB for the JVM heap. If you have many deleted columns I would say you have a *lot* of garbage in each row. Consider reducing the gc_grace seconds so the columns are purged more frequently, not however that columns are only purged when all fragments of the row are part of the minor compaction. If you have a mixed write / delete work load consider using the Levelled compaction strategy http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/03/2013, at 10:37 PM, Jason Wee peich...@gmail.com wrote: hmm.. did you managed to take a look using nodetool tpstats? That may give you indication further.. Jason On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote: Hi, My version is 1.1.7 Our use case is : we have a index columnfamily to record how many resource is stored for a user. The number might vary from tens to millions. We provide a feature to let user to delete resource according prefix. we found some cassandra will OOM after some period. The cluster is a kind of cross-datacenter ring. 1. Exception in cassandra log: ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5810,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5819,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-36,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-3990,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at
Re: Cassandra OOM, many deletedColumn
You need to provide some details of the machine and the JVM configuration. But lets say you need to have 4Gb to 8GB for the JVM heap. If you have many deleted columns I would say you have a *lot* of garbage in each row. Consider reducing the gc_grace seconds so the columns are purged more frequently, not however that columns are only purged when all fragments of the row are part of the minor compaction. If you have a mixed write / delete work load consider using the Levelled compaction strategy http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/03/2013, at 10:37 PM, Jason Wee peich...@gmail.com wrote: hmm.. did you managed to take a look using nodetool tpstats? That may give you indication further.. Jason On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote: Hi, My version is 1.1.7 Our use case is : we have a index columnfamily to record how many resource is stored for a user. The number might vary from tens to millions. We provide a feature to let user to delete resource according prefix. we found some cassandra will OOM after some period. The cluster is a kind of cross-datacenter ring. 1. Exception in cassandra log: ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5810,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5819,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-36,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-3990,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [ACCEPT-/10.139.50.62] AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ACCEPT-/10.139.50.62,5,main] java.lang.RuntimeException: java.nio.channels.ClosedChannelException at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:710) Caused by: java.nio.channels.ClosedChannelException at
Re: Cassandra OOM, many deletedColumn
hmm.. did you managed to take a look using nodetool tpstats? That may give you indication further.. Jason On Thu, Mar 7, 2013 at 1:56 PM, 金剑 jinjia...@gmail.com wrote: Hi, My version is 1.1.7 Our use case is : we have a index columnfamily to record how many resource is stored for a user. The number might vary from tens to millions. We provide a feature to let user to delete resource according prefix. we found some cassandra will OOM after some period. The cluster is a kind of cross-datacenter ring. 1. Exception in cassandra log: ERROR [Thread-5810] 2013-02-04 05:38:13,882 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5810,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(ThreadPoolExecutor.java:758) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-5819] 2013-02-04 05:38:13,888 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-5819,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-36] 2013-02-04 05:38:13,898 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-36,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [Thread-3990] 2013-02-04 05:38:13,902 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-3990,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) ERROR [ACCEPT-/10.139.50.62] AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ACCEPT-/10.139.50.62,5,main] java.lang.RuntimeException: java.nio.channels.ClosedChannelException at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:710) Caused by: java.nio.channels.ClosedChannelException at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:137) at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84) at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:699) INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java (line 374) Timed out replaying hints to /23.20.84.240; aborting further deliveries INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint INFO [HintedHandoff:1] 2013-02-04 05:38:24,971 HintedHandOffManager.java (line 296) Started hinted handoff for token: 3 2. From heap dump, there are many deletedColumn found, rooted from thread readStage. Pls help: where might be the problem? Best Regards! Jian Jin
Re: Cassandra OOM crash while mapping commitlog
Everything still runs smooth. It's really plausible that the 1.1.3 version resolved this bug. 2012/8/13 Robin Verlangen ro...@us2.nl 3 hours ago I finished the upgraded of our cluster. Currently it runs quite smooth. I'll give an update within a week if this really solved our issues. Cheers! 2012/8/13 Robin Verlangen ro...@us2.nl @Tyler: We were already running most of our machines in 64bit JVM (Sun, not the OpenJDK). Those also crashed. @Holger: Good to hear that. I'll schedule an update for our Cassandra cluster. Thank you both for your time. 2012/8/13 Holger Hoffstaette holger.hoffstae...@googlemail.com On Sun, 12 Aug 2012 13:36:42 +0200, Robin Verlangen wrote: Hmm, is issue caused by some 1.x version? Before it never occurred to us. This bug was introduced in 1.1.0 and has been fixed in 1.1.3, where the closed/recycled segments are now closed unmapped properly. The default sizes are also smaller. Of course the question remains why an append-only commitlog needs to be mmap'ed in the first place, especially for writing.. -h -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Cassandra OOM crash while mapping commitlog
@Tyler: We were already running most of our machines in 64bit JVM (Sun, not the OpenJDK). Those also crashed. @Holger: Good to hear that. I'll schedule an update for our Cassandra cluster. Thank you both for your time. 2012/8/13 Holger Hoffstaette holger.hoffstae...@googlemail.com On Sun, 12 Aug 2012 13:36:42 +0200, Robin Verlangen wrote: Hmm, is issue caused by some 1.x version? Before it never occurred to us. This bug was introduced in 1.1.0 and has been fixed in 1.1.3, where the closed/recycled segments are now closed unmapped properly. The default sizes are also smaller. Of course the question remains why an append-only commitlog needs to be mmap'ed in the first place, especially for writing.. -h -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Cassandra OOM crash while mapping commitlog
3 hours ago I finished the upgraded of our cluster. Currently it runs quite smooth. I'll give an update within a week if this really solved our issues. Cheers! 2012/8/13 Robin Verlangen ro...@us2.nl @Tyler: We were already running most of our machines in 64bit JVM (Sun, not the OpenJDK). Those also crashed. @Holger: Good to hear that. I'll schedule an update for our Cassandra cluster. Thank you both for your time. 2012/8/13 Holger Hoffstaette holger.hoffstae...@googlemail.com On Sun, 12 Aug 2012 13:36:42 +0200, Robin Verlangen wrote: Hmm, is issue caused by some 1.x version? Before it never occurred to us. This bug was introduced in 1.1.0 and has been fixed in 1.1.3, where the closed/recycled segments are now closed unmapped properly. The default sizes are also smaller. Of course the question remains why an append-only commitlog needs to be mmap'ed in the first place, especially for writing.. -h -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Cassandra OOM crash while mapping commitlog
Hmm, is issue caused by some 1.x version? Before it never occurred to us. Op 11 aug. 2012 22:36 schreef Tyler Hobbs ty...@datastax.com het volgende: We've seen something similar when running on a 32bit JVM, so make sure you're using the latest 64bit Java 6 JVM. On Sat, Aug 11, 2012 at 11:59 AM, Robin Verlangen ro...@us2.nl wrote: Hi there, I currently see Cassandra crash every couple of days. I run a 3 node cluster on version 1.1.2. Does anyone have a clue why it crashes? I couldn't find it as fix in a newer release. Is this an actual bug or did I do something wrong? Thank you in advance for your time. Last 100 log lines before crash: * INFO [FlushWriter:39] 2012-08-11 12:51:00,933 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-hd-7-Data.db (10778171 bytes) for commitlog position ReplayPosition(segmentId=2831860362157183, position=89962041)* * INFO [OptionalTasks:1] 2012-08-11 13:12:30,940 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican', ColumnFamily='wordevents') (estimated 74393593 bytes)* * INFO [OptionalTasks:1] 2012-08-11 13:12:30,941 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-wordevents@32552383(22883734/74393593 serialized/live bytes, 227279 ops)* * INFO [FlushWriter:40] 2012-08-11 13:12:30,941 Memtable.java (line 266) Writing Memtable-wordevents@32552383(22883734/74393593 serialized/live bytes, 227279 ops)* * INFO [FlushWriter:40] 2012-08-11 13:12:31,703 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-158-Data.db (11800327 bytes) for commitlog position ReplayPosition(segmentId=2831860362157183, position=116934579)* * INFO [MemoryMeter:1] 2012-08-11 14:01:36,942 Memtable.java (line 213) CFS(Keyspace='OpsCenter', ColumnFamily='rollups7200') liveRatio is 6.158919689235077 (just-counted was 4.408341190092955). calculation took 100ms for 16409 columns* * INFO [CompactionExecutor:88] 2012-08-11 14:08:27,875 AutoSavingCache.java (line 262) Saved KeyCache (38164 items) in 70 ms* * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican', ColumnFamily='wordevents') (estimated 74346493 bytes)* * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-wordevents@10789879(22869246/74346493 serialized/live bytes, 226341 ops)* * INFO [FlushWriter:41] 2012-08-11 14:18:37,520 Memtable.java (line 266) Writing Memtable-wordevents@10789879(22869246/74346493 serialized/live bytes, 226341 ops)* * INFO [FlushWriter:41] 2012-08-11 14:18:38,288 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-159-Data.db (11796722 bytes) for commitlog position ReplayPosition(segmentId=2838466681767183, position=67094743)* * WARN [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 197) setting live ratio to minimum of 1.0 instead of 0.45760196307363504* * INFO [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 213) CFS(Keyspace='Wupa', ColumnFamily='PageViewsHost') liveRatio is 1.0421914932457101 (just-counted was 1.0). calculation took 2ms for 175 columns* * INFO [MemoryMeter:1] 2012-08-11 14:33:20,916 Memtable.java (line 213) CFS(Keyspace='OpsCenter', ColumnFamily='rollups60') liveRatio is 4.067582667928898 (just-counted was 4.031462910772899). calculation took 711ms for 169224 columns* * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='pdps') (estimated 74395427 bytes)* * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-pdps@30500189(9222554/74395427 serialized/live bytes, 214478 ops)* * INFO [FlushWriter:42] 2012-08-11 14:59:20,910 Memtable.java (line 266) Writing Memtable-pdps@30500189(9222554/74395427 serialized/live bytes, 214478 ops)* * INFO [FlushWriter:42] 2012-08-11 14:59:21,420 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/OpsCenter/pdps/OpsCenter-pdps-hd-11351-Data.db (6928124 bytes) for commitlog position ReplayPosition(segmentId=2838466681767183, position=117115966)* * INFO [MemoryMeter:1] 2012-08-11 14:59:31,138 Memtable.java (line 213) CFS(Keyspace='OpsCenter', ColumnFamily='pdps') liveRatio is 14.460953759840738 (just-counted was 14.460953759840738). calculation took 28ms for 878 columns* * INFO [OptionalTasks:1] 2012-08-11 15:25:41,366 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican', ColumnFamily='wordevents') (estimated 74974061 bytes)* * INFO [OptionalTasks:1] 2012-08-11 15:25:41,367 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-wordevents@24703812(23062288/74974061
Re: Cassandra OOM crash while mapping commitlog
On Sun, 12 Aug 2012 13:36:42 +0200, Robin Verlangen wrote: Hmm, is issue caused by some 1.x version? Before it never occurred to us. This bug was introduced in 1.1.0 and has been fixed in 1.1.3, where the closed/recycled segments are now closed unmapped properly. The default sizes are also smaller. Of course the question remains why an append-only commitlog needs to be mmap'ed in the first place, especially for writing.. -h
Re: Cassandra OOM crash while mapping commitlog
We've seen something similar when running on a 32bit JVM, so make sure you're using the latest 64bit Java 6 JVM. On Sat, Aug 11, 2012 at 11:59 AM, Robin Verlangen ro...@us2.nl wrote: Hi there, I currently see Cassandra crash every couple of days. I run a 3 node cluster on version 1.1.2. Does anyone have a clue why it crashes? I couldn't find it as fix in a newer release. Is this an actual bug or did I do something wrong? Thank you in advance for your time. Last 100 log lines before crash: * INFO [FlushWriter:39] 2012-08-11 12:51:00,933 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-hd-7-Data.db (10778171 bytes) for commitlog position ReplayPosition(segmentId=2831860362157183, position=89962041)* * INFO [OptionalTasks:1] 2012-08-11 13:12:30,940 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican', ColumnFamily='wordevents') (estimated 74393593 bytes)* * INFO [OptionalTasks:1] 2012-08-11 13:12:30,941 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-wordevents@32552383(22883734/74393593 serialized/live bytes, 227279 ops)* * INFO [FlushWriter:40] 2012-08-11 13:12:30,941 Memtable.java (line 266) Writing Memtable-wordevents@32552383(22883734/74393593 serialized/live bytes, 227279 ops)* * INFO [FlushWriter:40] 2012-08-11 13:12:31,703 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-158-Data.db (11800327 bytes) for commitlog position ReplayPosition(segmentId=2831860362157183, position=116934579)* * INFO [MemoryMeter:1] 2012-08-11 14:01:36,942 Memtable.java (line 213) CFS(Keyspace='OpsCenter', ColumnFamily='rollups7200') liveRatio is 6.158919689235077 (just-counted was 4.408341190092955). calculation took 100ms for 16409 columns* * INFO [CompactionExecutor:88] 2012-08-11 14:08:27,875 AutoSavingCache.java (line 262) Saved KeyCache (38164 items) in 70 ms* * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican', ColumnFamily='wordevents') (estimated 74346493 bytes)* * INFO [OptionalTasks:1] 2012-08-11 14:18:37,519 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-wordevents@10789879(22869246/74346493 serialized/live bytes, 226341 ops)* * INFO [FlushWriter:41] 2012-08-11 14:18:37,520 Memtable.java (line 266) Writing Memtable-wordevents@10789879(22869246/74346493 serialized/live bytes, 226341 ops)* * INFO [FlushWriter:41] 2012-08-11 14:18:38,288 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/CloudPelican/wordevents/CloudPelican-wordevents-hd-159-Data.db (11796722 bytes) for commitlog position ReplayPosition(segmentId=2838466681767183, position=67094743)* * WARN [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 197) setting live ratio to minimum of 1.0 instead of 0.45760196307363504* * INFO [MemoryMeter:1] 2012-08-11 14:21:55,676 Memtable.java (line 213) CFS(Keyspace='Wupa', ColumnFamily='PageViewsHost') liveRatio is 1.0421914932457101 (just-counted was 1.0). calculation took 2ms for 175 columns* * INFO [MemoryMeter:1] 2012-08-11 14:33:20,916 Memtable.java (line 213) CFS(Keyspace='OpsCenter', ColumnFamily='rollups60') liveRatio is 4.067582667928898 (just-counted was 4.031462910772899). calculation took 711ms for 169224 columns* * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='OpsCenter', ColumnFamily='pdps') (estimated 74395427 bytes)* * INFO [OptionalTasks:1] 2012-08-11 14:59:20,909 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-pdps@30500189(9222554/74395427 serialized/live bytes, 214478 ops)* * INFO [FlushWriter:42] 2012-08-11 14:59:20,910 Memtable.java (line 266) Writing Memtable-pdps@30500189(9222554/74395427 serialized/live bytes, 214478 ops)* * INFO [FlushWriter:42] 2012-08-11 14:59:21,420 Memtable.java (line 307) Completed flushing /var/lib/cassandra/data/OpsCenter/pdps/OpsCenter-pdps-hd-11351-Data.db (6928124 bytes) for commitlog position ReplayPosition(segmentId=2838466681767183, position=117115966)* * INFO [MemoryMeter:1] 2012-08-11 14:59:31,138 Memtable.java (line 213) CFS(Keyspace='OpsCenter', ColumnFamily='pdps') liveRatio is 14.460953759840738 (just-counted was 14.460953759840738). calculation took 28ms for 878 columns* * INFO [OptionalTasks:1] 2012-08-11 15:25:41,366 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='CloudPelican', ColumnFamily='wordevents') (estimated 74974061 bytes)* * INFO [OptionalTasks:1] 2012-08-11 15:25:41,367 ColumnFamilyStore.java (line 643) Enqueuing flush of Memtable-wordevents@24703812(23062288/74974061 serialized/live bytes, 228878 ops)* * INFO [FlushWriter:43] 2012-08-11 15:25:41,367 Memtable.java (line 266) Writing
Re: Cassandra OOM - 1.0.2
Just to ask the stupid question, have you tried setting it really high ? Like 50 ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/02/2012, at 10:27 AM, Ajeet Grewal wrote: Here are the last few lines of strace (of one of the threads). There are a bunch of mmap system calls. Notice the last mmap call a couple of lines before the trace ends. Could the last mmap call fail? == BEGIN STRACE == mmap(NULL, 2147487599, PROT_READ, MAP_SHARED, 37, 0xbb000) = 0x7709b54000 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0 mmap(NULL, 214743, PROT_READ, MAP_SHARED, 37, 0xc7fffb000) = 0x7789b55000 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0 mmap(NULL, 2147483522, PROT_READ, MAP_SHARED, 37, 0xc4000) = 0x7809b4f000 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0 mmap(NULL, 1586100174, PROT_READ, MAP_SHARED, 37, 0xd7fff3000) = 0x7889b4f000 dup2(40, 37)= 37 close(37) = 0 open(/home/y/var/fresh_cassandra/data/fresh/counter_object-h-4240-Filter.db, O_RDONLY) = 37 . . . . close(37) = 0 futex(0x2ab5a39754, FUTEX_WAKE, 1) = 1 futex(0x2ab5a39750, FUTEX_WAKE, 1) = 1 futex(0x40116940, FUTEX_WAKE, 1)= 1 mmap(0x41a17000, 12288, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x41a17000 rt_sigprocmask(SIG_SETMASK, [QUIT], NULL, 8) = 0 _exit(0)= ? == END STRACE == -- Regards, Ajeet
Re: Cassandra OOM - 1.0.2
On Tue, Feb 7, 2012 at 10:45 AM, aaron morton aa...@thelastpickle.com wrote: Just to ask the stupid question, have you tried setting it really high ? Like 50 ? No I have not. I moved to mmap_index_only as a stopgap solution. Is it possible for there to be that many mmaps for about 300 db files? -- Regards, Ajeet
Re: Cassandra OOM - 1.0.2
On Sat, Feb 4, 2012 at 7:03 AM, Jonathan Ellis jbel...@gmail.com wrote: Sounds like you need to increase sysctl vm.max_map_count This did not work. I increased vm.max_map_count from 65536 to 131072. I am still getting the same error. ERROR [SSTableBatchOpen:4] 2012-02-06 11:43:50,463 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[SSTableBatchOpen:4,5,main] java.io.IOError: java.io.IOException: Map failed at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:225) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:202) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:380) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:159) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:197) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Map failed -- Regards, Ajeet
Re: Cassandra OOM - 1.0.2
On Mon, Feb 6, 2012 at 11:50 AM, Ajeet Grewal asgre...@gmail.com wrote: On Sat, Feb 4, 2012 at 7:03 AM, Jonathan Ellis jbel...@gmail.com wrote: Sounds like you need to increase sysctl vm.max_map_count This did not work. I increased vm.max_map_count from 65536 to 131072. I am still getting the same error. The number of files in the data directory is small (~300), so I dont see why mmap should fail because of this. -- Regards, Ajeet
Re: Cassandra OOM - 1.0.2
Here are the last few lines of strace (of one of the threads). There are a bunch of mmap system calls. Notice the last mmap call a couple of lines before the trace ends. Could the last mmap call fail? == BEGIN STRACE == mmap(NULL, 2147487599, PROT_READ, MAP_SHARED, 37, 0xbb000) = 0x7709b54000 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0 mmap(NULL, 214743, PROT_READ, MAP_SHARED, 37, 0xc7fffb000) = 0x7789b55000 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0 mmap(NULL, 2147483522, PROT_READ, MAP_SHARED, 37, 0xc4000) = 0x7809b4f000 fstat(37, {st_mode=S_IFREG|0644, st_size=59568105422, ...}) = 0 mmap(NULL, 1586100174, PROT_READ, MAP_SHARED, 37, 0xd7fff3000) = 0x7889b4f000 dup2(40, 37)= 37 close(37) = 0 open(/home/y/var/fresh_cassandra/data/fresh/counter_object-h-4240-Filter.db, O_RDONLY) = 37 . . . . close(37) = 0 futex(0x2ab5a39754, FUTEX_WAKE, 1) = 1 futex(0x2ab5a39750, FUTEX_WAKE, 1) = 1 futex(0x40116940, FUTEX_WAKE, 1)= 1 mmap(0x41a17000, 12288, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x41a17000 rt_sigprocmask(SIG_SETMASK, [QUIT], NULL, 8) = 0 _exit(0)= ? == END STRACE == -- Regards, Ajeet
Re: Cassandra OOM - 1.0.2
Sounds like you need to increase sysctl vm.max_map_count On Fri, Feb 3, 2012 at 7:27 PM, Ajeet Grewal asgre...@gmail.com wrote: Hey guys, I am getting an out of memory (mmap failed) error with Cassandra 1.0.2. The relevant log lines are pasted at http://pastebin.com/UM28ZC1g. Cassandra works fine until it reaches about 300-400GB of load (on one instance, I have 12 nodes RF=2). Then nodes start failing with such errors. The nodes are pretty beefy, 32GB of ram, 8 cores. Increasing the JVM heap size does not help. I am running on a 64bit jvm. I am using jna. I have memlock unlimited for the user. (I confirmed this by looking at /proc/pid/limits). I also tried restarting the process as root, but it crashes with the same error. Also the number of files that I have in the data directory is about ~300, so it should not be exceeding the open files limit. I don't know if this is relevant. I just have two column families, counter_object and counter_time. I am using very wide columns, so row sizes can be huge. You can see from the log link, that the *.db files are sometimes pretty big. Please help! Thank you! -- Regards, Ajeet -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cassandra OOM
2012/1/4 Vitalii Tymchyshyn tiv...@gmail.com 04.01.12 14:25, Radim Kolar написав(ла): So, what are cassandra memory requirement? Is it 1% or 2% of disk data? It depends on number of rows you have. if you have lot of rows then primary memory eaters are index sampling data and bloom filters. I use index sampling 512 and bloom filters set to 4% to cut down memory needed. I've raised index sampling and bloom filter setting seems not to be on trunk yet. For me memtables is what's eating heap :( Hello, all. I've found out and fixed the problem today (after one my node OOMed constantly replaying heap on start-up). full-key deletes are not accounted and so column families with delete-only operations are not flushed. Here is Jira: https://issues.apache.org/jira/browse/CASSANDRA-3741 and my pull request to fix it: https://github.com/apache/cassandra/pull/5 Best regards, Vitalii Tymchyshyn
Re: Cassandra OOM
Hello. BTW: It would be great for cassandra to shutdown on Errors like OOM because now I am not sure if the problem described in previous email is the root cause or some of OOM error found in log made some writer stop. I am now looking at different OOMs in my cluster. Currently each node has up to 300G of data in ~10 column families. Previous Heap Size of 3G seems to be not enough, I am raising to to 5G. Looking at heap dumps, a lot of memory is taken by memtables, much more than 1/3 of heap. At the same time, logs say that it has nothing to flush since there are not dirty memtables. So, what are cassandra memory requirement? Is it 1% or 2% of disk data? Or may be I am doing something wrong? Best regards, Vitalii Tymchyshyn 03.01.12 20:58, aaron morton написав(ла): The DynamicSnitch can result in less read operations been sent to a node, but as long as a node is marked as UP mutations are sent to all replicas. Nodes will shed load when they pull messages off the queue that have expired past rpc_timeout, but they will not feed back flow control to the other nodes. Other than going down or performing slow enough for the dynamic snitch to route reads around them. There are also safety valves in there to reduce the size of the memtables and caches in response to low memory. Perhaps that process could also shed messages from thread pools with a high number of pending messages. **But** going OOM with 2M+ mutations in the thread pool sounds like the server was going down anyway. Did you look into why all the messages were there ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/01/2012, at 11:18 PM, Віталій Тимчишин wrote: Hello. We are using cassandra for some time in our project. Currently we are on 1.1 trunk (it was accidental migration, but since it's hard to migrate back and it's performing nice enough we are currently on 1.1). During New Year holidays one of the servers've produces a number of OOM messages in the log. According to heap dump taken, most of the memory is taken by MutationStage queue (over 2millions of items). So, I am curious now if cassandra have any flow control for messages? We are using Quorum for writes and it seems to me that one slow server may start getting more messages than it can consume. The writes will still succeed performed by other servers in the replication set. If there is no flow control, it should eventually get OOM. Is it the case? Are there any plans to handle this? BTW: A lot of memory (~half) is taken by Inet4Address objects, so making a cache of such objects would make this problem less possible. -- Best regards, Vitalii Tymchyshyn
Re: Cassandra OOM
04.01.12 14:25, Radim Kolar написав(ла): So, what are cassandra memory requirement? Is it 1% or 2% of disk data? It depends on number of rows you have. if you have lot of rows then primary memory eaters are index sampling data and bloom filters. I use index sampling 512 and bloom filters set to 4% to cut down memory needed. I've raised index sampling and bloom filter setting seems not to be on trunk yet. For me memtables is what's eating heap :( Best regards, Vitalii Tymchyshyn.
Re: Cassandra OOM
The DynamicSnitch can result in less read operations been sent to a node, but as long as a node is marked as UP mutations are sent to all replicas. Nodes will shed load when they pull messages off the queue that have expired past rpc_timeout, but they will not feed back flow control to the other nodes. Other than going down or performing slow enough for the dynamic snitch to route reads around them. There are also safety valves in there to reduce the size of the memtables and caches in response to low memory. Perhaps that process could also shed messages from thread pools with a high number of pending messages. **But** going OOM with 2M+ mutations in the thread pool sounds like the server was going down anyway. Did you look into why all the messages were there ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/01/2012, at 11:18 PM, Віталій Тимчишин wrote: Hello. We are using cassandra for some time in our project. Currently we are on 1.1 trunk (it was accidental migration, but since it's hard to migrate back and it's performing nice enough we are currently on 1.1). During New Year holidays one of the servers've produces a number of OOM messages in the log. According to heap dump taken, most of the memory is taken by MutationStage queue (over 2millions of items). So, I am curious now if cassandra have any flow control for messages? We are using Quorum for writes and it seems to me that one slow server may start getting more messages than it can consume. The writes will still succeed performed by other servers in the replication set. If there is no flow control, it should eventually get OOM. Is it the case? Are there any plans to handle this? BTW: A lot of memory (~half) is taken by Inet4Address objects, so making a cache of such objects would make this problem less possible. -- Best regards, Vitalii Tymchyshyn
Re: Cassandra OOM on repair.
Looks like problem in code: public IndexSummary(long expectedKeys) { long expectedEntries = expectedKeys / DatabaseDescriptor.getIndexInterval(); if (expectedEntries Integer.MAX_VALUE) // TODO: that's a _lot_ of keys, or a very low interval throw new RuntimeException(Cannot use index_interval of + DatabaseDescriptor.getIndexInterval() + with + expectedKeys + (expected) keys.); indexPositions = new ArrayListKeyPosition((int)expectedEntries); } I have too many keys, and too small index interval. To fix this, I can: 1) reduce number of keys - rewrite app and sacrifice balance 2) increase index_interval - hurt another column families A question: Are there any drawbacks for using different indexInterval for column families in keyspace? (suppose I'll write a patch) 2011/7/15 Andrey Stepachev oct...@gmail.com Looks like key indexes eat all memory: http://paste.kde.org/97213/ 2011/7/15 Andrey Stepachev oct...@gmail.com UPDATE: I found, that a) with min10G cassandra survive. b) I have ~1000 sstables c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow So, I have a question: a) if row is bigger then 64mb before compaction, why it compacted in memory b) if it smaller, what eats so much memory? 2011/7/15 Andrey Stepachev oct...@gmail.com Hi all. Cassandra constantly OOM on repair or compaction. Increasing memory doesn't help (6G) I can give more, but I think that this is not a regular situation. Cluster has 4 nodes. RF=3. Cassandra version 0.8.1 Ring looks like this: Address DC RackStatus State Load OwnsToken 127605887595351923798765477786913079296 xxx.xxx.xxx.66 datacenter1 rack1 Up Normal 176.96 GB 25.00% 0 xxx.xxx.xxx.69 datacenter1 rack1 Up Normal 178.19 GB 25.00% 42535295865117307932921825928971026432 xxx.xxx.xxx.67 datacenter1 rack1 Up Normal 178.26 GB 25.00% 85070591730234615865843651857942052864 xxx.xxx.xxx.68 datacenter1 rack1 Up Normal 175.2 GB 25.00% 127605887595351923798765477786913079296 About schema: I have big rows (100k, up to several millions). But as I know, it is normal for cassandra. All things work relatively good, until I start long running pre-production tests. I load data and after a while (~4hours) cluster begin timeout and them some nodes die with OOM. My app retries to send, so after short period all nodes becomes down. Very nasty. But now, I can OOM nodes by simple call nodetool repair. In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to upper limit. cfstats shows: http://paste.kde.org/96817/ config is: http://paste.kde.org/96823/ A question is: does anybody knows, what this means. Why cassandra tries to load something big into memory at once? A.
Re: Cassandra OOM on repair.
Can't think of any. On Sun, Jul 17, 2011 at 1:27 PM, Andrey Stepachev oct...@gmail.com wrote: Looks like problem in code: public IndexSummary(long expectedKeys) { long expectedEntries = expectedKeys / DatabaseDescriptor.getIndexInterval(); if (expectedEntries Integer.MAX_VALUE) // TODO: that's a _lot_ of keys, or a very low interval throw new RuntimeException(Cannot use index_interval of + DatabaseDescriptor.getIndexInterval() + with + expectedKeys + (expected) keys.); indexPositions = new ArrayListKeyPosition((int)expectedEntries); } I have too many keys, and too small index interval. To fix this, I can: 1) reduce number of keys - rewrite app and sacrifice balance 2) increase index_interval - hurt another column families A question: Are there any drawbacks for using different indexInterval for column families in keyspace? (suppose I'll write a patch) 2011/7/15 Andrey Stepachev oct...@gmail.com Looks like key indexes eat all memory: http://paste.kde.org/97213/ 2011/7/15 Andrey Stepachev oct...@gmail.com UPDATE: I found, that a) with min10G cassandra survive. b) I have ~1000 sstables c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow So, I have a question: a) if row is bigger then 64mb before compaction, why it compacted in memory b) if it smaller, what eats so much memory? 2011/7/15 Andrey Stepachev oct...@gmail.com Hi all. Cassandra constantly OOM on repair or compaction. Increasing memory doesn't help (6G) I can give more, but I think that this is not a regular situation. Cluster has 4 nodes. RF=3. Cassandra version 0.8.1 Ring looks like this: Address DC Rack Status State Load Owns Token 127605887595351923798765477786913079296 xxx.xxx.xxx.66 datacenter1 rack1 Up Normal 176.96 GB 25.00% 0 xxx.xxx.xxx.69 datacenter1 rack1 Up Normal 178.19 GB 25.00% 42535295865117307932921825928971026432 xxx.xxx.xxx.67 datacenter1 rack1 Up Normal 178.26 GB 25.00% 85070591730234615865843651857942052864 xxx.xxx.xxx.68 datacenter1 rack1 Up Normal 175.2 GB 25.00% 127605887595351923798765477786913079296 About schema: I have big rows (100k, up to several millions). But as I know, it is normal for cassandra. All things work relatively good, until I start long running pre-production tests. I load data and after a while (~4hours) cluster begin timeout and them some nodes die with OOM. My app retries to send, so after short period all nodes becomes down. Very nasty. But now, I can OOM nodes by simple call nodetool repair. In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to upper limit. cfstats shows: http://paste.kde.org/96817/ config is: http://paste.kde.org/96823/ A question is: does anybody knows, what this means. Why cassandra tries to load something big into memory at once? A. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cassandra OOM on repair.
Looks like key indexes eat all memory: http://paste.kde.org/97213/ 2011/7/15 Andrey Stepachev oct...@gmail.com UPDATE: I found, that a) with min10G cassandra survive. b) I have ~1000 sstables c) CompactionManager uses PrecompactedRows instead of LazilyCompactedRow So, I have a question: a) if row is bigger then 64mb before compaction, why it compacted in memory b) if it smaller, what eats so much memory? 2011/7/15 Andrey Stepachev oct...@gmail.com Hi all. Cassandra constantly OOM on repair or compaction. Increasing memory doesn't help (6G) I can give more, but I think that this is not a regular situation. Cluster has 4 nodes. RF=3. Cassandra version 0.8.1 Ring looks like this: Address DC RackStatus State Load OwnsToken 127605887595351923798765477786913079296 xxx.xxx.xxx.66 datacenter1 rack1 Up Normal 176.96 GB 25.00% 0 xxx.xxx.xxx.69 datacenter1 rack1 Up Normal 178.19 GB 25.00% 42535295865117307932921825928971026432 xxx.xxx.xxx.67 datacenter1 rack1 Up Normal 178.26 GB 25.00% 85070591730234615865843651857942052864 xxx.xxx.xxx.68 datacenter1 rack1 Up Normal 175.2 GB 25.00% 127605887595351923798765477786913079296 About schema: I have big rows (100k, up to several millions). But as I know, it is normal for cassandra. All things work relatively good, until I start long running pre-production tests. I load data and after a while (~4hours) cluster begin timeout and them some nodes die with OOM. My app retries to send, so after short period all nodes becomes down. Very nasty. But now, I can OOM nodes by simple call nodetool repair. In logs http://paste.kde.org/96811/ it is clear, how heap rocketjump to upper limit. cfstats shows: http://paste.kde.org/96817/ config is: http://paste.kde.org/96823/ A question is: does anybody knows, what this means. Why cassandra tries to load something big into memory at once? A.