[jira] [Commented] (CASSANDRA-11117) ColUpdateTimeDeltaHistogram histogram overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380172#comment-15380172 ] Jeff Griffith commented on CASSANDRA-7: --- Yes, same here. Upgraded from 2.1. > ColUpdateTimeDeltaHistogram histogram overflow > -- > > Key: CASSANDRA-7 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Joel Knighton >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > > {code} > getting attribute Mean of > org.apache.cassandra.metrics:type=ColumnFamily,name=ColUpdateTimeDeltaHistogram > threw an exceptionjavax.management.RuntimeMBeanException: > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > {code} > Although the fact that this histogram has 164 buckets already, I wonder if > there is something weird with the computation thats causing this to be so > large? It appears to be coming from updates to system.local > {code} > org.apache.cassandra.metrics:type=Table,keyspace=system,scope=local,name=ColUpdateTimeDeltaHistogram > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11117) ColUpdateTimeDeltaHistogram histogram overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280185#comment-15280185 ] Jeff Griffith commented on CASSANDRA-7: --- the code that updates this is here in ColmnFamilyStore.java: {code} public void apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition replayPosition) { long start = System.nanoTime(); Memtable mt = data.getMemtableFor(opGroup, replayPosition); final long timeDelta = mt.put(key, columnFamily, indexer, opGroup); maybeUpdateRowCache(key); metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1); metric.writeLatency.addNano(System.nanoTime() - start); if(timeDelta < Long.MAX_VALUE) metric.colUpdateTimeDeltaHistogram.update(timeDelta); } {code} That "if (timeDelta < Long.MAX_VALUE)" looks ill-conceived since there are no longs > max long, but i don't really know what exactly is overflowing in the histogram. > ColUpdateTimeDeltaHistogram histogram overflow > -- > > Key: CASSANDRA-7 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Joel Knighton >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > > {code} > getting attribute Mean of > org.apache.cassandra.metrics:type=ColumnFamily,name=ColUpdateTimeDeltaHistogram > threw an exceptionjavax.management.RuntimeMBeanException: > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > {code} > Although the fact that this histogram has 164 buckets already, I wonder if > there is something weird with the computation thats causing this to be so > large? It appears to be coming from updates to system.local > {code} > org.apache.cassandra.metrics:type=Table,keyspace=system,scope=local,name=ColUpdateTimeDeltaHistogram > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-11117) ColUpdateTimeDeltaHistogram histogram overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-7: -- Comment: was deleted (was: the code that updates this is here in ColmnFamilyStore.java: {code} public void apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition replayPosition) { long start = System.nanoTime(); Memtable mt = data.getMemtableFor(opGroup, replayPosition); final long timeDelta = mt.put(key, columnFamily, indexer, opGroup); maybeUpdateRowCache(key); metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1); metric.writeLatency.addNano(System.nanoTime() - start); if(timeDelta < Long.MAX_VALUE) metric.colUpdateTimeDeltaHistogram.update(timeDelta); } {code} That "if (timeDelta < Long.MAX_VALUE)" looks ill-conceived since there are no longs > max long, but i don't really know what exactly is overflowing in the histogram. ) > ColUpdateTimeDeltaHistogram histogram overflow > -- > > Key: CASSANDRA-7 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Joel Knighton >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > > {code} > getting attribute Mean of > org.apache.cassandra.metrics:type=ColumnFamily,name=ColUpdateTimeDeltaHistogram > threw an exceptionjavax.management.RuntimeMBeanException: > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > {code} > Although the fact that this histogram has 164 buckets already, I wonder if > there is something weird with the computation thats causing this to be so > large? It appears to be coming from updates to system.local > {code} > org.apache.cassandra.metrics:type=Table,keyspace=system,scope=local,name=ColUpdateTimeDeltaHistogram > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-11117) ColUpdateTimeDeltaHistogram histogram overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280152#comment-15280152 ] Jeff Griffith edited comment on CASSANDRA-7 at 5/11/16 1:59 PM: the code that updates this is here in ColmnFamilyStore.java: {code} public void apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition replayPosition) { long start = System.nanoTime(); Memtable mt = data.getMemtableFor(opGroup, replayPosition); final long timeDelta = mt.put(key, columnFamily, indexer, opGroup); maybeUpdateRowCache(key); metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1); metric.writeLatency.addNano(System.nanoTime() - start); if(timeDelta < Long.MAX_VALUE) metric.colUpdateTimeDeltaHistogram.update(timeDelta); } {code} That "if (timeDelta < Long.MAX_VALUE)" looks ill-conceived since there are no longs > max long, but i don't really know what exactly is overflowing in the histogram. was (Author: jeffery.griffith): the code that updates this is here: {code} public void apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition replayPosition) { long start = System.nanoTime(); Memtable mt = data.getMemtableFor(opGroup, replayPosition); final long timeDelta = mt.put(key, columnFamily, indexer, opGroup); maybeUpdateRowCache(key); metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1); metric.writeLatency.addNano(System.nanoTime() - start); if(timeDelta < Long.MAX_VALUE) metric.colUpdateTimeDeltaHistogram.update(timeDelta); } {code} That "if (timeDelta < Long.MAX_VALUE)" looks ill-conceived since there are no longs > max long, but i don't really know what exactly is overflowing in the histogram. > ColUpdateTimeDeltaHistogram histogram overflow > -- > > Key: CASSANDRA-7 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Joel Knighton >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > > {code} > getting attribute Mean of > org.apache.cassandra.metrics:type=ColumnFamily,name=ColUpdateTimeDeltaHistogram > threw an exceptionjavax.management.RuntimeMBeanException: > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > {code} > Although the fact that this histogram has 164 buckets already, I wonder if > there is something weird with the computation thats causing this to be so > large? It appears to be coming from updates to system.local > {code} > org.apache.cassandra.metrics:type=Table,keyspace=system,scope=local,name=ColUpdateTimeDeltaHistogram > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11117) ColUpdateTimeDeltaHistogram histogram overflow
[ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280152#comment-15280152 ] Jeff Griffith commented on CASSANDRA-7: --- the code that updates this is here: {code} public void apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition replayPosition) { long start = System.nanoTime(); Memtable mt = data.getMemtableFor(opGroup, replayPosition); final long timeDelta = mt.put(key, columnFamily, indexer, opGroup); maybeUpdateRowCache(key); metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1); metric.writeLatency.addNano(System.nanoTime() - start); if(timeDelta < Long.MAX_VALUE) metric.colUpdateTimeDeltaHistogram.update(timeDelta); } {code} That "if (timeDelta < Long.MAX_VALUE)" looks ill-conceived since there are no longs > max long, but i don't really know what exactly is overflowing in the histogram. > ColUpdateTimeDeltaHistogram histogram overflow > -- > > Key: CASSANDRA-7 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Joel Knighton >Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.x > > > {code} > getting attribute Mean of > org.apache.cassandra.metrics:type=ColumnFamily,name=ColUpdateTimeDeltaHistogram > threw an exceptionjavax.management.RuntimeMBeanException: > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > {code} > Although the fact that this histogram has 164 buckets already, I wonder if > there is something weird with the computation thats causing this to be so > large? It appears to be coming from updates to system.local > {code} > org.apache.cassandra.metrics:type=Table,keyspace=system,scope=local,name=ColUpdateTimeDeltaHistogram > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11751) Histogram overflow in metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280143#comment-15280143 ] Jeff Griffith commented on CASSANDRA-11751: --- Thanks [~tjake]. Sorry for the duplicate. > Histogram overflow in metrics > - > > Key: CASSANDRA-11751 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11751 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 2.2.6 on Linux >Reporter: Jeff Griffith > > One particular histogram in the cassandra metrics seems to overflow > preventing the calculation of the mean on the dropwizard "Snapshot". Here is > the exception that comes from the metrics library: > {code} > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > at > org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:232) > ~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT] > at > org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103) > ~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT] > at > com.addthis.metrics3.reporter.config.SplunkReporter.reportHistogram(SplunkReporter.java:155) > ~[reporter-config3-3.0.0.jar:3.0.0] > at > com.addthis.metrics3.reporter.config.SplunkReporter.report(SplunkReporter.java:101) > ~[reporter-config3-3.0.0.jar:3.0.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_72] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_72] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_72] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_72] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_72] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_72] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] > {code} > On deeper analysis, it seems like this is happening specifically on this > metric: > {code} > ColUpdateTimeDeltaHistogram > {code} > I think this is where it is updated in ColumnFamilyStore.java > {code} > public void apply(DecoratedKey key, ColumnFamily columnFamily, > SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition > replayPosition) > { > long start = System.nanoTime(); > Memtable mt = data.getMemtableFor(opGroup, replayPosition); > final long timeDelta = mt.put(key, columnFamily, indexer, opGroup); > maybeUpdateRowCache(key); > metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), > key.hashCode(), 1); > metric.writeLatency.addNano(System.nanoTime() - start); > if(timeDelta < Long.MAX_VALUE) > metric.colUpdateTimeDeltaHistogram.update(timeDelta); > } > {code} > Considering it's calculating a mean, i don't know if perhaps a large sum > might be overflowing? But that "if (timeDelta < Long.MAX_VALUE)" looks > suspect, doesn't it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11751) Histogram overflow in metrics
Jeff Griffith created CASSANDRA-11751: - Summary: Histogram overflow in metrics Key: CASSANDRA-11751 URL: https://issues.apache.org/jira/browse/CASSANDRA-11751 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.2.6 on Linux Reporter: Jeff Griffith One particular histogram in the cassandra metrics seems to overflow preventing the calculation of the mean on the dropwizard "Snapshot". Here is the exception that comes from the metrics library: {code} java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:232) ~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT] at org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103) ~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT] at com.addthis.metrics3.reporter.config.SplunkReporter.reportHistogram(SplunkReporter.java:155) ~[reporter-config3-3.0.0.jar:3.0.0] at com.addthis.metrics3.reporter.config.SplunkReporter.report(SplunkReporter.java:101) ~[reporter-config3-3.0.0.jar:3.0.0] at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0] at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_72] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_72] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_72] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] {code} On deeper analysis, it seems like this is happening specifically on this metric: {code} ColUpdateTimeDeltaHistogram {code} I think this is where it is updated in ColumnFamilyStore.java {code} public void apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition replayPosition) { long start = System.nanoTime(); Memtable mt = data.getMemtableFor(opGroup, replayPosition); final long timeDelta = mt.put(key, columnFamily, indexer, opGroup); maybeUpdateRowCache(key); metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1); metric.writeLatency.addNano(System.nanoTime() - start); if(timeDelta < Long.MAX_VALUE) metric.colUpdateTimeDeltaHistogram.update(timeDelta); } {code} Considering it's calculating a mean, i don't know if perhaps a large sum might be overflowing? But that "if (timeDelta < Long.MAX_VALUE)" looks suspect, doesn't it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11504) Slow inter-node network growth & gc issues with uptime
[ https://issues.apache.org/jira/browse/CASSANDRA-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-11504: -- Environment: Cassandra 2.1.13 & 2.2.5 (was: Cassandra 2.1.13) > Slow inter-node network growth & gc issues with uptime > -- > > Key: CASSANDRA-11504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11504 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.1.13 & 2.2.5 >Reporter: Jeff Griffith > Attachments: InterNodeTraffic.jpg > > > We are looking for help troubleshooting our production environment where we > are experiencing GC problems. After much experimentation and troubleshooting > with various settings, the only correlation that we can find with a slow > growth in GC is a slow growth in network traffic BETWEEN cassandra nodes in > our cluster. As an example, I have attached an example where in a cluster of > 24 nodes, i restarted 23 of them. Note that the outgoing rate for that 24th > node remains high while all others drop after the restart. Also note that > this graph is ONLY traffic between cassandra nodes. Traffic from the clients > remains FLAT throughout. Analyzing column family stats shows they are flat > throughout. Cache hit rates are also consistent across nodes. GC is of course > its own can of worms so we are hoping this considerable increase in traffic > (more than double over the course of 6rs) between nodes explains it. We would > greatly appreciate any ideas as to why this extra network output correlates > to uptime or ideas on what to "diff" between the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11504) Slow inter-node network growth & gc issues with uptime
[ https://issues.apache.org/jira/browse/CASSANDRA-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-11504: -- Description: We are looking for help troubleshooting our production environment where we are experiencing GC problems. After much experimentation and troubleshooting with various settings, the only correlation that we can find with a slow growth in GC is a slow growth in network traffic BETWEEN cassandra nodes in our cluster. As an example, I have attached an example where in a cluster of 24 nodes, i restarted 23 of them. Note that the outgoing rate for that 24th node remains high while all others drop after the restart. Also note that this graph is ONLY traffic between cassandra nodes. Traffic from the clients remains FLAT throughout. Analyzing column family stats shows they are flat throughout. Cache hit rates are also consistent across nodes. GC is of course its own can of worms so we are hoping this considerable increase in traffic (more than double over the course of 6rs) between nodes explains it. We would greatly appreciate any ideas as to why this extra network output correlates to uptime or ideas on what to "diff" between the nodes. (was: We are looking for help troubleshooting our production environment where we are experiencing GC problems. After much experimentation and troubleshooting with various settings, the only correlation that we can find with a slow growth in GC a slow growth in network traffic BETWEEN cassandra nodes in our cluster. As an example, I have attached an example where in a cluster of 24 nodes, i restarted 23 of them. Note that the outgoing rate for that 24th node remains high while all others drop after the restart. Also note that this graph is ONLY traffic between cassandra nodes. Traffic from the clients remains FLAT throughout. Analyzing column family stats shows they are flat throughout. Cache hit rates are also consistent across nodes. GC is of course its own can of worms so we are hoping this considerable increase in traffic (more than double over the course of 6rs) between nodes explains it. We would greatly appreciate any ideas as to why this extra network output correlates to uptime or ideas on what to "diff" between the nodes.) > Slow inter-node network growth & gc issues with uptime > -- > > Key: CASSANDRA-11504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11504 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.1.13 >Reporter: Jeff Griffith > Attachments: InterNodeTraffic.jpg > > > We are looking for help troubleshooting our production environment where we > are experiencing GC problems. After much experimentation and troubleshooting > with various settings, the only correlation that we can find with a slow > growth in GC is a slow growth in network traffic BETWEEN cassandra nodes in > our cluster. As an example, I have attached an example where in a cluster of > 24 nodes, i restarted 23 of them. Note that the outgoing rate for that 24th > node remains high while all others drop after the restart. Also note that > this graph is ONLY traffic between cassandra nodes. Traffic from the clients > remains FLAT throughout. Analyzing column family stats shows they are flat > throughout. Cache hit rates are also consistent across nodes. GC is of course > its own can of worms so we are hoping this considerable increase in traffic > (more than double over the course of 6rs) between nodes explains it. We would > greatly appreciate any ideas as to why this extra network output correlates > to uptime or ideas on what to "diff" between the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11504) Slow inter-node network growth & gc issues with uptime
[ https://issues.apache.org/jira/browse/CASSANDRA-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-11504: -- Description: We are looking for help troubleshooting our production environment where we are experiencing GC problems. After much experimentation and troubleshooting with various settings, the only correlation that we can find with a slow growth in GC a slow growth in network traffic BETWEEN cassandra nodes in our cluster. As an example, I have attached an example where in a cluster of 24 nodes, i restarted 23 of them. Note that the outgoing rate for that 24th node remains high while all others drop after the restart. Also note that this graph is ONLY traffic between cassandra nodes. Traffic from the clients remains FLAT throughout. Analyzing column family stats shows they are flat throughout. Cache hit rates are also consistent across nodes. GC is of course its own can of worms so we are hoping this considerable increase in traffic (more than double over the course of 6rs) between nodes explains it. We would greatly appreciate any ideas as to why this extra network output correlates to uptime or ideas on what to "diff" between the nodes. (was: We are looking for help troubleshooting our production environment where we are experiencing GC problems. After much experimentation and troubleshooting with various settings, the only correlation that we can find with a slow growth in GC a slow growth in network traffic BETWEEN cassandra nodes in our cluster. As an example, I have attached an example where in a cluster of 24 nodes, i restarted 23 of them. Note that the outgoing rate for that 24th node remains high while all others drop after the restart. Also note that this graph is ONLY traffic between cassandra nodes. Traffic from the clients remains FLAT throughout. Analyzing column family stats shows they are flat throughout. Cache hit rates are also consistent across nodes. GC is of course its own can of worms so we are hoping this considerable increase in traffic (more than double over the course of 6rs) between nodes explains it. We would greatly appreciate any ideas as to why this extra network output correlates to uptime.) > Slow inter-node network growth & gc issues with uptime > -- > > Key: CASSANDRA-11504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11504 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.1.13 >Reporter: Jeff Griffith > Attachments: InterNodeTraffic.jpg > > > We are looking for help troubleshooting our production environment where we > are experiencing GC problems. After much experimentation and troubleshooting > with various settings, the only correlation that we can find with a slow > growth in GC a slow growth in network traffic BETWEEN cassandra nodes in our > cluster. As an example, I have attached an example where in a cluster of 24 > nodes, i restarted 23 of them. Note that the outgoing rate for that 24th node > remains high while all others drop after the restart. Also note that this > graph is ONLY traffic between cassandra nodes. Traffic from the clients > remains FLAT throughout. Analyzing column family stats shows they are flat > throughout. Cache hit rates are also consistent across nodes. GC is of course > its own can of worms so we are hoping this considerable increase in traffic > (more than double over the course of 6rs) between nodes explains it. We would > greatly appreciate any ideas as to why this extra network output correlates > to uptime or ideas on what to "diff" between the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11504) Slow inter-node network growth & gc issues with uptime
Jeff Griffith created CASSANDRA-11504: - Summary: Slow inter-node network growth & gc issues with uptime Key: CASSANDRA-11504 URL: https://issues.apache.org/jira/browse/CASSANDRA-11504 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.1.13 Reporter: Jeff Griffith Attachments: InterNodeTraffic.jpg We are looking for help troubleshooting our production environment where we are experiencing GC problems. After much experimentation and troubleshooting with various settings, the only correlation that we can find with a slow growth in GC a slow growth in network traffic BETWEEN cassandra nodes in our cluster. As an example, I have attached an example where in a cluster of 24 nodes, i restarted 23 of them. Note that the outgoing rate for that 24th node remains high while all others drop after the restart. Also note that this graph is ONLY traffic between cassandra nodes. Traffic from the clients remains FLAT throughout. Analyzing column family stats shows they are flat throughout. Cache hit rates are also consistent across nodes. GC is of course its own can of worms so we are hoping this considerable increase in traffic (more than double over the course of 6rs) between nodes explains it. We would greatly appreciate any ideas as to why this extra network output correlates to uptime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035856#comment-15035856 ] Jeff Griffith commented on CASSANDRA-10515: --- That's correct. Two problems led independently to the build up: Cause 1 fixed by Marcus: sstable leveling info was lost during sstable upgrade leading to thread contention due to large # of tables at L0. Cause 2 fixed by Benedict: index out of bounds exception caused by integer overflow. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov > Labels: commitlog, triage > Fix For: 3.0.1, 3.1, 2.1.x, 2.2.x > > Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, > CommitLogProblem.jpg, CommitLogSize.jpg, > MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, > cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10692) Don't remove level info when doing upgradesstables
[ https://issues.apache.org/jira/browse/CASSANDRA-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018277#comment-15018277 ] Jeff Griffith commented on CASSANDRA-10692: --- [~krummas] can you confirm when the last commit was for this fix on the cassandra-2.1 branch? in the comments it looks you have pushed something else 2 days ago (nov 18?) but all i is this back on Nov 12: commit 246cb883ab09bc69e842b8124c1537b38bb54335 Author: Marcus ErikssonDate: Thu Nov 12 08:12:01 2015 +0100 Thanks. > Don't remove level info when doing upgradesstables > -- > > Key: CASSANDRA-10692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10692 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 2.1.12, 2.2.4 > > > Seems we blow away the level info when doing upgradesstables. Introduced in > CASSANDRA-8004 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10692) Don't remove level info when doing upgradesstables
[ https://issues.apache.org/jira/browse/CASSANDRA-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018277#comment-15018277 ] Jeff Griffith edited comment on CASSANDRA-10692 at 11/20/15 4:41 PM: - [~krummas] can you confirm when the last commit was for this fix on the cassandra-2.1 branch? in the comments it looks you have pushed something else 2 days ago (nov 18?) but all i is this back on Nov 12: commit 246cb883ab09bc69e842b8124c1537b38bb54335 Author: Marcus ErikssonDate: Thu Nov 12 08:12:01 2015 +0100 (asking because i produced my own build after the 12th but before the 18th) Thanks. was (Author: jeffery.griffith): [~krummas] can you confirm when the last commit was for this fix on the cassandra-2.1 branch? in the comments it looks you have pushed something else 2 days ago (nov 18?) but all i is this back on Nov 12: commit 246cb883ab09bc69e842b8124c1537b38bb54335 Author: Marcus Eriksson Date: Thu Nov 12 08:12:01 2015 +0100 Thanks. > Don't remove level info when doing upgradesstables > -- > > Key: CASSANDRA-10692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10692 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 2.1.12, 2.2.4 > > > Seems we blow away the level info when doing upgradesstables. Introduced in > CASSANDRA-8004 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002048#comment-15002048 ] Jeff Griffith commented on CASSANDRA-10515: --- Thanks [~krummas] i assume you mean this explains the large number of sstables (55k) we experienced? I see you've fixed it. I have moved to the latest 2.1 so this should help with our rollout. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov > Labels: commitlog, triage > Fix For: 3.1, 2.1.x, 2.2.x > > Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, > CommitLogProblem.jpg, CommitLogSize.jpg, > MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, > cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002085#comment-15002085 ] Jeff Griffith commented on CASSANDRA-10515: --- Good to know. We'll watch out for it and use the offline leveling trick you suggested. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov > Labels: commitlog, triage > Fix For: 3.1, 2.1.x, 2.2.x > > Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, > CommitLogProblem.jpg, CommitLogSize.jpg, > MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, > cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000585#comment-15000585 ] Jeff Griffith commented on CASSANDRA-10579: --- (i'll try to move things to 2.1.11 to simplify this) > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.12, 2.2.4 > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at >
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000519#comment-15000519 ] Jeff Griffith commented on CASSANDRA-10579: --- hi again [~benedict] sorry to bug you with this again, but could you pls confirm what is on your 10579-fix branch? i'm trying to merge a few patches and it looks like there are several other things mixed in now. at one point, it was strictly based on 2.1.10. thx, --jg > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.12, 2.2.4 > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) >
[jira] [Comment Edited] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000585#comment-15000585 ] Jeff Griffith edited comment on CASSANDRA-10579 at 11/11/15 4:29 PM: - (i'll try to move things to 2.1.11 to simplify this. looks like it's based on the 2.1 branch though.) was (Author: jeffery.griffith): (i'll try to move things to 2.1.11 to simplify this) > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.12, 2.2.4 > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] >
[jira] [Comment Edited] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000585#comment-15000585 ] Jeff Griffith edited comment on CASSANDRA-10579 at 11/11/15 4:29 PM: - (i'll try to move things to 2.1.11 to simplify this. looks like it's based on the main 2.1 branch though.) was (Author: jeffery.griffith): (i'll try to move things to 2.1.11 to simplify this. looks like it's based on the 2.1 branch though.) > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.12, 2.2.4 > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) >
[jira] [Commented] (CASSANDRA-7408) System hints corruption - dataSize ... would be larger than file
[ https://issues.apache.org/jira/browse/CASSANDRA-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000950#comment-15000950 ] Jeff Griffith commented on CASSANDRA-7408: -- no problem [~iamaleksey] i seem to recall this being related to an issue i reported separately where a short integer was overflowing. pretty sure it's all good now. > System hints corruption - dataSize ... would be larger than file > > > Key: CASSANDRA-7408 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7408 > Project: Cassandra > Issue Type: Bug > Environment: RHEL 6.5 > Cassandra 1.2.16 > RF=3 > Thrift >Reporter: Jeff Griffith > > I've found several unresolved JIRA tickets related to SSTable corruption but > not sure if they apply to the case we are seeing in system/hints. We see > periodic exceptions such as: > {noformat} > dataSize of 144115248479299639 starting at 17209 would be larger than file > /home/y/var/cassandra/data/system/hints/system-hints-ic-219-Data.db length > 35542 > {noformat} > Is there something we could possibly be doing from the application to cause > this sort of corruption? We also see it on some of our own column families > also some *negative* lengths which are presumably a similar corruption. > {noformat} > ERROR [HintedHandoff:57] 2014-06-17 17:08:04,690 CassandraDaemon.java (line > 191) Exception in thread Thread[HintedHandoff:57,1,main] > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: > dataSize of 144115248479299639 starting at 17209 would be larger than file > /home/y/var/cassandra/data/system/hints/system-hints-ic-219-Data.db length > 35542 > at > org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:441) > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282) > at > org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90) > at > org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:508) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: > dataSize of 144115248479299639 starting at 17209 would be larger than file > /home/y/var/cassandra/data/system/hints/system-hints-ic-219-Data.db length > 35542 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:437) > ... 6 more > Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: > java.io.IOException: dataSize of 144115248479299639 starting at 17209 would > be larger than file > /home/y/var/cassandra/data/system/hints/system-hints-ic-219-Data.db length > 35542 > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:167) > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:83) > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:69) > at > org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) > at > org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) > at > org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) > at > org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) > at > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at > org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:145) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at >
[jira] [Comment Edited] (CASSANDRA-7408) System hints corruption - dataSize ... would be larger than file
[ https://issues.apache.org/jira/browse/CASSANDRA-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000950#comment-15000950 ] Jeff Griffith edited comment on CASSANDRA-7408 at 11/11/15 7:46 PM: no problem [~iamaleksey] i seem to recall this being related to an issue i reported separately that was fixed where a short integer was overflowing. pretty sure it's all good now. was (Author: jeffery.griffith): no problem [~iamaleksey] i seem to recall this being related to an issue i reported separately where a short integer was overflowing. pretty sure it's all good now. > System hints corruption - dataSize ... would be larger than file > > > Key: CASSANDRA-7408 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7408 > Project: Cassandra > Issue Type: Bug > Environment: RHEL 6.5 > Cassandra 1.2.16 > RF=3 > Thrift >Reporter: Jeff Griffith > > I've found several unresolved JIRA tickets related to SSTable corruption but > not sure if they apply to the case we are seeing in system/hints. We see > periodic exceptions such as: > {noformat} > dataSize of 144115248479299639 starting at 17209 would be larger than file > /home/y/var/cassandra/data/system/hints/system-hints-ic-219-Data.db length > 35542 > {noformat} > Is there something we could possibly be doing from the application to cause > this sort of corruption? We also see it on some of our own column families > also some *negative* lengths which are presumably a similar corruption. > {noformat} > ERROR [HintedHandoff:57] 2014-06-17 17:08:04,690 CassandraDaemon.java (line > 191) Exception in thread Thread[HintedHandoff:57,1,main] > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: > dataSize of 144115248479299639 starting at 17209 would be larger than file > /home/y/var/cassandra/data/system/hints/system-hints-ic-219-Data.db length > 35542 > at > org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:441) > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282) > at > org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90) > at > org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:508) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: > dataSize of 144115248479299639 starting at 17209 would be larger than file > /home/y/var/cassandra/data/system/hints/system-hints-ic-219-Data.db length > 35542 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:437) > ... 6 more > Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: > java.io.IOException: dataSize of 144115248479299639 starting at 17209 would > be larger than file > /home/y/var/cassandra/data/system/hints/system-hints-ic-219-Data.db length > 35542 > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:167) > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:83) > at > org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:69) > at > org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) > at > org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) > at > org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) > at > org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) > at > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:122) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:96) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at >
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985217#comment-14985217 ] Jeff Griffith commented on CASSANDRA-10579: --- Does the NodeBuilder thing prevent me from going to prod with your branch [~benedict] ? > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at >
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980375#comment-14980375 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/29/15 12:58 PM: -- [~blambov] [~benedict] See image: Killed two birds with one stone here it seems. Looking at he logs before the commit log growth, it looks like the IndexOutOfBounds exceptions affected all nodes in this small cluster of 3 at the same time, with with RF=3 that probably makes sense, doesn't it? https://issues.apache.org/jira/secure/attachment/12769525/CASSANDRA-19579.jpg was (Author: jeffery.griffith): [~blambov] [~benedict] Killed two birds with one stone here it seems. Looking at he logs before the commit log growth, it looks like the IndexOutOfBounds exceptions affected all nodes in this small cluster of 3 at the same time, with with RF=3 that probably makes sense, doesn't it? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, > CommitLogProblem.jpg, CommitLogSize.jpg, > MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, > cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: CASSANDRA-19579.jpg [~blambov] [~benedict] Killed two birds with one stone here it seems. Looking at he logs before the commit log growth, it looks like the IndexOutOfBounds exceptions affected all nodes in this small cluster of 3 at the same time, with with RF=3 that probably makes sense, doesn't it? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, > CommitLogProblem.jpg, CommitLogSize.jpg, > MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, > cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980375#comment-14980375 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/29/15 12:59 PM: -- [~blambov] [~benedict] See image: Killed two birds with one stone here it seems. Looking at the logs before the commit log growth, it looks like the IndexOutOfBounds exceptions affected all nodes in this small cluster of 3 at the same time, with with RF=3 that probably makes sense, doesn't it? https://issues.apache.org/jira/secure/attachment/12769525/CASSANDRA-19579.jpg was (Author: jeffery.griffith): [~blambov] [~benedict] See image: Killed two birds with one stone here it seems. Looking at he logs before the commit log growth, it looks like the IndexOutOfBounds exceptions affected all nodes in this small cluster of 3 at the same time, with with RF=3 that probably makes sense, doesn't it? https://issues.apache.org/jira/secure/attachment/12769525/CASSANDRA-19579.jpg > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CASSANDRA-19579.jpg, > CommitLogProblem.jpg, CommitLogSize.jpg, > MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, cassandra.yaml, > cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978368#comment-14978368 ] Jeff Griffith commented on CASSANDRA-10515: --- Great info, thanks [~benedict] ! I guess my "all roads lead to Rome" analogy was a good one :-) [~blambov] i have the rest running now. normally happens a couple of times a day so i should know by this evening. On that first form of growth, I unfortunately did not get a chance to try it before the problem corrected itself. I believe that gradually our 3 remaining problematic nodes too longer to reduce the # of L0 files than did the rest of our clusters. It took weeks rather than days and coincidentally I got involved near the end. From what saw, though, the symptoms seemed to match exactly what [~krummas] described. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, > cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978368#comment-14978368 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/28/15 12:58 PM: -- Great info, thanks [~benedict] ! I guess my "all roads lead to Rome" analogy was a good one :-) [~blambov] i have the test running now. normally happens a couple of times a day so i should know by this evening. On that first form of growth, I unfortunately did not get a chance to try it before the problem corrected itself. I believe that gradually our 3 remaining problematic nodes too longer to reduce the # of L0 files than did the rest of our clusters. It took weeks rather than days and coincidentally I got involved near the end. From what saw, though, the symptoms seemed to match exactly what [~krummas] described. was (Author: jeffery.griffith): Great info, thanks [~benedict] ! I guess my "all roads lead to Rome" analogy was a good one :-) [~blambov] i have the rest running now. normally happens a couple of times a day so i should know by this evening. On that first form of growth, I unfortunately did not get a chance to try it before the problem corrected itself. I believe that gradually our 3 remaining problematic nodes too longer to reduce the # of L0 files than did the rest of our clusters. It took weeks rather than days and coincidentally I got involved near the end. From what saw, though, the symptoms seemed to match exactly what [~krummas] described. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, > cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978368#comment-14978368 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/28/15 1:00 PM: - Great info, thanks [~benedict] ! I guess my "all roads lead to Rome" analogy was a good one :-) [~blambov] i have the test running now. normally happens a couple of times a day so i should know by this evening. On that first form of growth, I unfortunately did not get a chance to try it before the problem corrected itself. I believe that gradually our 3 remaining problematic nodes too longer to reduce the # of L0 files than did the rest of our clusters. It took weeks rather than days and coincidentally I got involved near the end. From what I saw, though, the symptoms seemed to match exactly what [~krummas] described. was (Author: jeffery.griffith): Great info, thanks [~benedict] ! I guess my "all roads lead to Rome" analogy was a good one :-) [~blambov] i have the test running now. normally happens a couple of times a day so i should know by this evening. On that first form of growth, I unfortunately did not get a chance to try it before the problem corrected itself. I believe that gradually our 3 remaining problematic nodes too longer to reduce the # of L0 files than did the rest of our clusters. It took weeks rather than days and coincidentally I got involved near the end. From what saw, though, the symptoms seemed to match exactly what [~krummas] described. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, > cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976555#comment-14976555 ] Jeff Griffith commented on CASSANDRA-10579: --- So [~benedict] I rebuilt 2.1.10 with a merge of your diagnostic patch plus the changes you mention above for integer overflow. I tried this on a node where i had re-enabled assertions. i THINK but i am not certain that the assertions suppress seeing the commit log IndexOutOfBounds exception, i will confirm this. but the GOOD news is that this version DOES seem to fix the startup problem! I will confirm this on the next node that fails where assertions are off. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at >
[jira] [Comment Edited] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976555#comment-14976555 ] Jeff Griffith edited comment on CASSANDRA-10579 at 10/27/15 3:34 PM: - So [~benedict] I rebuilt 2.1.10 with a merge of your diagnostic patch plus the changes you mention above for integer overflow. I tried this on a node where i had re-enabled assertions. i THINK but i am not certain that the assertions suppress seeing the commit log IndexOutOfBounds exception, i will confirm this. but the GOOD news is that this version DOES seem to fix the startup problem! I will confirm this on the next node that fails where assertions are off. By the way, it seems like this may also be leading to sstable corruption (probably not surprising since it's flushing sstables when the IOOB exception happens?) was (Author: jeffery.griffith): So [~benedict] I rebuilt 2.1.10 with a merge of your diagnostic patch plus the changes you mention above for integer overflow. I tried this on a node where i had re-enabled assertions. i THINK but i am not certain that the assertions suppress seeing the commit log IndexOutOfBounds exception, i will confirm this. but the GOOD news is that this version DOES seem to fix the startup problem! I will confirm this on the next node that fails where assertions are off. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT]
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976664#comment-14976664 ] Jeff Griffith commented on CASSANDRA-10579: --- Yes, we are seeing sstable corruption also which we scrub. Not 100% certain it results from this index out of bounds problem though. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >
[jira] [Comment Edited] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976864#comment-14976864 ] Jeff Griffith edited comment on CASSANDRA-10579 at 10/27/15 6:09 PM: - perfect. thanks again. was (Author: jeffery.griffith): perfect. thanks. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31]
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976703#comment-14976703 ] Jeff Griffith commented on CASSANDRA-10579: --- my pleasure, thanks for the patch! we are running on 2.1.10. is the patch only for 2.1.11? > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at >
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976864#comment-14976864 ] Jeff Griffith commented on CASSANDRA-10579: --- perfect. thanks. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at >
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977052#comment-14977052 ] Jeff Griffith commented on CASSANDRA-10515: --- [~krummas] [~tjake] something interesting on this second form of commit log growth where all nodes had uncontrolled commit log growth unless the first example (many files in L0) where it was isolated nodes. for this latter case, I think i'm able to relate this to a separate problem with an index out of bounds exception. working with [~benedict] it seems like we have that one solved. i'm hopeful that patch will solve this growing commit log problem as well. it seems like all roads lead to rome where rome is commit log growth :-) here is the other JIRA identifying an integer overflow in AbstractNativeCell.java https://issues.apache.org/jira/browse/CASSANDRA-10579 Still uncertain how to proceed with the first form that seems to be starvation as you have described. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, > cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977052#comment-14977052 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/27/15 7:59 PM: - [~krummas] [~tjake] something interesting on this second form of commit log growth where all nodes had uncontrolled commit log growth unlike the first example (many files in L0) where it was isolated nodes. for this latter case, I think i'm able to relate this to a separate problem with an index out of bounds exception. working with [~benedict] it seems like we have that one solved. i'm hopeful that patch will solve this growing commit log problem as well. it seems like all roads lead to rome where rome is commit log growth :-) here is the other JIRA identifying an integer overflow in AbstractNativeCell.java https://issues.apache.org/jira/browse/CASSANDRA-10579 Still uncertain how to proceed with the first form that seems to be starvation as you have described. was (Author: jeffery.griffith): [~krummas] [~tjake] something interesting on this second form of commit log growth where all nodes had uncontrolled commit log growth unless the first example (many files in L0) where it was isolated nodes. for this latter case, I think i'm able to relate this to a separate problem with an index out of bounds exception. working with [~benedict] it seems like we have that one solved. i'm hopeful that patch will solve this growing commit log problem as well. it seems like all roads lead to rome where rome is commit log growth :-) here is the other JIRA identifying an integer overflow in AbstractNativeCell.java https://issues.apache.org/jira/browse/CASSANDRA-10579 Still uncertain how to proceed with the first form that seems to be starvation as you have described. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, > cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974292#comment-14974292 ] Jeff Griffith commented on CASSANDRA-10579: --- thanks [~benedict] i'll try to capture those. it seems to be tricky to identify the specific commit log causing the problem. i'm trying to do some math with the segment ID but haven't quite figured out how to isolate it. either way i'll try to attach something useful shortly. re previous version, we have seen this before but since we just upgraded to 2.1.10 it does seem to be becoming a more frequent occurrence. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at >
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974323#comment-14974323 ] Jeff Griffith commented on CASSANDRA-10579: --- Thanks for the refinement. I'll check on the assertions. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at >
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974335#comment-14974335 ] Jeff Griffith commented on CASSANDRA-10579: --- Yes, it doesn't look like we have -ea in our jvm opts. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at >
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974357#comment-14974357 ] Jeff Griffith commented on CASSANDRA-10579: --- I re-enabled assertions on this node and here is the first: WARN [SharedPool-Worker-7] 2015-10-26 15:10:44,777 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-7,5,main]: {} java.lang.AssertionError: null at org.apache.cassandra.db.AbstractNativeCell.checkPosition(AbstractNativeCell.java:585) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.getByteBuffer(AbstractNativeCell.java:657) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.get(AbstractNativeCell.java:304) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.get(AbstractNativeCell.java:291) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.sizeOf(AbstractNativeCell.java:132) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.(AbstractNativeCell.java:120) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.NativeCell.(NativeCell.java:40) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.memory.NativeAllocator.clone(NativeAllocator.java:72) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.NativeCell.localCopy(NativeCell.java:64) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:445) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:418) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.NodeBuilder.addNewKey(NodeBuilder.java:322) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:190) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Memtable.put(Memtable.java:210) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_31] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31] > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying >
[jira] [Comment Edited] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974357#comment-14974357 ] Jeff Griffith edited comment on CASSANDRA-10579 at 10/26/15 3:13 PM: - I re-enabled assertions on this node and here is the first: {code} WARN [SharedPool-Worker-7] 2015-10-26 15:10:44,777 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-7,5,main]: {} java.lang.AssertionError: null at org.apache.cassandra.db.AbstractNativeCell.checkPosition(AbstractNativeCell.java:585) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.getByteBuffer(AbstractNativeCell.java:657) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.get(AbstractNativeCell.java:304) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.get(AbstractNativeCell.java:291) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.sizeOf(AbstractNativeCell.java:132) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.(AbstractNativeCell.java:120) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.NativeCell.(NativeCell.java:40) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.memory.NativeAllocator.clone(NativeAllocator.java:72) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.NativeCell.localCopy(NativeCell.java:64) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:445) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:418) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.NodeBuilder.addNewKey(NodeBuilder.java:322) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:190) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Memtable.put(Memtable.java:210) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_31] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31] {code} was (Author: jeffery.griffith): I re-enabled assertions on this node and here is the first: WARN [SharedPool-Worker-7] 2015-10-26 15:10:44,777 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-7,5,main]: {} java.lang.AssertionError: null at org.apache.cassandra.db.AbstractNativeCell.checkPosition(AbstractNativeCell.java:585) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.getByteBuffer(AbstractNativeCell.java:657) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.get(AbstractNativeCell.java:304) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.get(AbstractNativeCell.java:291) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.sizeOf(AbstractNativeCell.java:132)
[jira] [Comment Edited] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974335#comment-14974335 ] Jeff Griffith edited comment on CASSANDRA-10579 at 10/26/15 2:57 PM: - Yes i think they are disabled. It doesn't look like we have -ea in our jvm opts. was (Author: jeffery.griffith): Yes, it doesn't look like we have -ea in our jvm opts. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] >
[jira] [Commented] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup (with offheap_objects)
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974622#comment-14974622 ] Jeff Griffith commented on CASSANDRA-10579: --- Great thanks [~benedict]. i'll merge both changes in and give it a try. > IndexOutOfBoundsException during memtable flushing at startup (with > offheap_objects) > > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith >Assignee: Benedict > Fix For: 2.1.x > > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at >
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971417#comment-14971417 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/23/15 5:34 PM: - [~krummas] [~tjake] Here is a separate instance of commit logs breaking our 12G setting but with different behavior. I have captured the whole thing with thread dumps and tpstats every two minutes. I've embedded pending numbers in the filenames for your convenience to make it easy to see where the backup starts. *-node1.tar.gz is the only one i uploaded since the files were so large, but note in the Dashboard.jpg file that all three nodes break the limit at about the same time. I can upload the others if it is useful. This case seems different from the previous case where there were lots of L0 files causing thread blocking, but even here it seems like the MemtablePostFlush is stopping on a countdownlatch. https://issues.apache.org/jira/secure/attachment/12768344/MultinodeCommitLogGrowth-node1.tar.gz This happened twice during this period and here is the first one. Note the pid changed because our monitoring detected and restarted the node. {code} tpstats_20151023-00:16:02_pid_37996_postpend_0.txt tpstats_20151023-00:18:08_pid_37996_postpend_1.txt tpstats_20151023-00:20:14_pid_37996_postpend_0.txt tpstats_20151023-00:22:19_pid_37996_postpend_3.txt tpstats_20151023-00:24:25_pid_37996_postpend_133.txt tpstats_20151023-00:26:30_pid_37996_postpend_809.txt tpstats_20151023-00:28:35_pid_37996_postpend_1596.txt tpstats_20151023-00:30:39_pid_37996_postpend_2258.txt tpstats_20151023-00:32:42_pid_37996_postpend_3095.txt tpstats_20151023-00:34:45_pid_37996_postpend_3822.txt tpstats_20151023-00:36:48_pid_37996_postpend_4593.txt tpstats_20151023-00:38:52_pid_37996_postpend_5363.txt tpstats_20151023-00:40:55_pid_37996_postpend_6212.txt tpstats_20151023-00:42:59_pid_37996_postpend_7137.txt tpstats_20151023-00:45:03_pid_37996_postpend_8559.txt tpstats_20151023-00:47:06_pid_37996_postpend_9060.txt tpstats_20151023-00:49:09_pid_37996_postpend_9060.txt tpstats_20151023-00:51:11_pid_48196_postpend_0.txt tpstats_20151023-00:53:13_pid_48196_postpend_0.txt tpstats_20151023-00:55:16_pid_48196_postpend_0.txt tpstats_20151023-00:57:21_pid_48196_postpend_0.txt {code} was (Author: jeffery.griffith): [~krummas] [~tjake] Here is a separate instance of commit logs breaking our 12G setting but with different behavior. I have captured the whole thing with thread dumps and tpstats every two minutes. I've embedded pending numbers in the filenames for your convenience to make it easy to see where the backup starts. *-node1.tar.gz is the only one i uploaded since the files were so large, but note in the Dashboard.jpg file that all three nodes break the limit at about the same time. I can upload the others if it is useful. This case seems different from the previous case where there were lots of L0 files causing thread blocking, but even here it seems like the MemtablePostFlush is stopping on a countdownlatch. https://issues.apache.org/jira/secure/attachment/12768344/MultinodeCommitLogGrowth-node1.tar.gz This happened twice during this period and here is the first one. Note the pid changed because our monitoring detected and restarted the node. {code} -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:16 tpstats_20151023-00:16:02_pid_37996_postpend_0.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:18 tpstats_20151023-00:18:08_pid_37996_postpend_1.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:20 tpstats_20151023-00:20:14_pid_37996_postpend_0.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:22 tpstats_20151023-00:22:19_pid_37996_postpend_3.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:24 tpstats_20151023-00:24:25_pid_37996_postpend_133.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:26 tpstats_20151023-00:26:30_pid_37996_postpend_809.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:28 tpstats_20151023-00:28:35_pid_37996_postpend_1596.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:30 tpstats_20151023-00:30:39_pid_37996_postpend_2258.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:32 tpstats_20151023-00:32:42_pid_37996_postpend_3095.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:34 tpstats_20151023-00:34:45_pid_37996_postpend_3822.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:36 tpstats_20151023-00:36:48_pid_37996_postpend_4593.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:38 tpstats_20151023-00:38:52_pid_37996_postpend_5363.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:40 tpstats_20151023-00:40:55_pid_37996_postpend_6212.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:43 tpstats_20151023-00:42:59_pid_37996_postpend_7137.txt -rw-r--r-- 1
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971417#comment-14971417 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/23/15 5:33 PM: - [~krummas] [~tjake] Here is a separate instance of commit logs breaking our 12G setting but with different behavior. I have captured the whole thing with thread dumps and tpstats every two minutes. I've embedded pending numbers in the filenames for your convenience to make it easy to see where the backup starts. *-node1.tar.gz is the only one i uploaded since the files were so large, but note in the Dashboard.jpg file that all three nodes break the limit at about the same time. I can upload the others if it is useful. This case seems different from the previous case where there were lots of L0 files causing thread blocking, but even here it seems like the MemtablePostFlush is stopping on a countdownlatch. https://issues.apache.org/jira/secure/attachment/12768344/MultinodeCommitLogGrowth-node1.tar.gz This happened twice during this period and here is the first one. Note the pid changed because our monitoring detected and restarted the node. {code} -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:16 tpstats_20151023-00:16:02_pid_37996_postpend_0.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:18 tpstats_20151023-00:18:08_pid_37996_postpend_1.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:20 tpstats_20151023-00:20:14_pid_37996_postpend_0.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:22 tpstats_20151023-00:22:19_pid_37996_postpend_3.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:24 tpstats_20151023-00:24:25_pid_37996_postpend_133.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:26 tpstats_20151023-00:26:30_pid_37996_postpend_809.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:28 tpstats_20151023-00:28:35_pid_37996_postpend_1596.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:30 tpstats_20151023-00:30:39_pid_37996_postpend_2258.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:32 tpstats_20151023-00:32:42_pid_37996_postpend_3095.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:34 tpstats_20151023-00:34:45_pid_37996_postpend_3822.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:36 tpstats_20151023-00:36:48_pid_37996_postpend_4593.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:38 tpstats_20151023-00:38:52_pid_37996_postpend_5363.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:40 tpstats_20151023-00:40:55_pid_37996_postpend_6212.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:43 tpstats_20151023-00:42:59_pid_37996_postpend_7137.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:45 tpstats_20151023-00:45:03_pid_37996_postpend_8559.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2002 Oct 22 20:47 tpstats_20151023-00:47:06_pid_37996_postpend_9060.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2002 Oct 22 20:49 tpstats_20151023-00:49:09_pid_37996_postpend_9060.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2002 Oct 22 20:51 tpstats_20151023-00:51:11_pid_48196_postpend_0.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2002 Oct 22 20:53 tpstats_20151023-00:53:13_pid_48196_postpend_0.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:55 tpstats_20151023-00:55:16_pid_48196_postpend_0.txt -rw-r--r-- 1 jgriffith Y\Domain Users 2180 Oct 22 20:57 tpstats_20151023-00:57:21_pid_48196_postpend_0.txt {code} was (Author: jeffery.griffith): [~krummas] [~tjake] Here is a separate instance of commit logs breaking our 12G setting but with different behavior. I have captured the whole thing with thread dumps and tpstats every two minutes. I've embedded pending numbers in the filenames for your convenience to make it easy to see where the backup starts. *-node1.tar.gz is the only one i uploaded since the files were so large, but note in the Dashboard.jpg file that all three nodes break the limit at about the same time. I can upload the others if it is useful. This case seems different from the previous case where there were lots of L0 files causing thread blocking, but even here it seems like the MemtablePostFlush is stopping on a countdownlatch. https://issues.apache.org/jira/secure/attachment/12768344/MultinodeCommitLogGrowth-node1.tar.gz > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog,
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971417#comment-14971417 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/23/15 5:31 PM: - [~krummas] [~tjake] Here is a separate instance of commit logs breaking our 12G setting but with different behavior. I have captured the whole thing with thread dumps and tpstats every two minutes. I've embedded pending numbers in the filenames for your convenience to make it easy to see where the backup starts. *-node1.tar.gz is the only one i uploaded since the files were so large, but note in the Dashboard.jpg file that all three nodes break the limit at about the same time. I can upload the others if it is useful. This case seems different from the previous case where there were lots of L0 files causing thread blocking, but even here it seems like the MemtablePostFlush is stopping on a countdownlatch. https://issues.apache.org/jira/secure/attachment/12768344/MultinodeCommitLogGrowth-node1.tar.gz was (Author: jeffery.griffith): [~krummas] [~tjake] Here is a separate instance of commit logs breaking our 12G setting but with different behavior. I have captured the whole thing with thread dumps and tpstats every two minutes. I've embedded pending numbers in the filenames for your convenience to make it easy to see where the backup starts. *-node1.tar.gz is the only one i uploaded since the files were so large, but note in the Dashboard.jpg file that all three nodes break the limit at about the same time. I can upload the others if it is useful. This case seems different from the previous case where there were lots of L0 files causing thread blocking, but even here it seems like the MemtablePostFlush is stopping on a countdownlatch. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, > cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: MultinodeCommitLogGrowth-node1.tar.gz [~krummas] [~tjake] Here is a separate instance of commit logs breaking our 12G setting but with different behavior. I have captured the whole thing with thread dumps and tpstats every two minutes. I've embedded pending numbers in the filenames for your convenience to make it easy to see where the backup starts. *-node1.tar.gz is the only one i uploaded since the files were so large, but note in the Dashboard.jpg file that all three nodes break the limit at about the same time. I can upload the others if it is useful. This case seems different from the previous case where there were lots of L0 files causing thread blocking, but even here it seems like the MemtablePostFlush is stopping on a countdownlatch. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, > cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10579) IndexOutOfBoundsException during memtable flushing at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10579: -- Summary: IndexOutOfBoundsException during memtable flushing at startup (was: IndexOutOfBoundsException) > IndexOutOfBoundsException during memtable flushing at startup > - > > Key: CASSANDRA-10579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 2.1.10 on linux >Reporter: Jeff Griffith > > Sometimes we have problems at startup where memtable flushes with an index > out of bounds exception as seen below. Cassandra is then dead in the water > until we track down the corresponding commit log via the segment ID and > remove it: > {code} > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log > INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, > messaging version 8) > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished > reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log > INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying > /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, > messaging version 8) > WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-5,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 6 > at > org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Memtable.put(Memtable.java:210) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_31] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) >
[jira] [Created] (CASSANDRA-10579) IndexOutOfBoundsException
Jeff Griffith created CASSANDRA-10579: - Summary: IndexOutOfBoundsException Key: CASSANDRA-10579 URL: https://issues.apache.org/jira/browse/CASSANDRA-10579 Project: Cassandra Issue Type: Bug Components: Core Environment: 2.1.10 on linux Reporter: Jeff Griffith Sometimes we have problems at startup where memtable flushes with an index out of bounds exception as seen below. Cassandra is then dead in the water until we track down the corresponding commit log via the segment ID and remove it: {code} INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:267 - Replaying /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log INFO [main] 2015-10-23 14:43:36,440 CommitLogReplayer.java:270 - Replaying /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log (CL version 4, messaging version 8) INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:478 - Finished reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832692.log INFO [main] 2015-10-23 14:43:36,594 CommitLogReplayer.java:267 - Replaying /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log INFO [main] 2015-10-23 14:43:36,595 CommitLogReplayer.java:270 - Replaying /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log (CL version 4, messaging version 8) INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:478 - Finished reading /home/y/var/cassandra/commitlog/CommitLog-4-1445474832693.log INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:267 - Replaying /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log INFO [main] 2015-10-23 14:43:36,699 CommitLogReplayer.java:270 - Replaying /home/y/var/cassandra/commitlog/CommitLog-4-1445474832694.log (CL version 4, messaging version 8) WARN [SharedPool-Worker-5] 2015-10-23 14:43:36,747 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-5,5,main]: {} java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.cassandra.db.AbstractNativeCell.nametype(AbstractNativeCell.java:204) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AbstractNativeCell.isStatic(AbstractNativeCell.java:199) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.composites.AbstractCType.compare(AbstractCType.java:166) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:61) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.composites.AbstractCellNameType$1.compare(AbstractCellNameType.java:58) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.BTree.find(BTree.java:277) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:154) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:225) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Memtable.put(Memtable.java:210) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1225) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_31] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.10.jar:2.1.10-SNAPSHOT] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965636#comment-14965636 ] Jeff Griffith commented on CASSANDRA-10515: --- Hi again [~krummas] Before trying the leveling, the remaining problematic clusters seemed to work out the # of files in L0 problem. They were all trending downward but there were several days where it was very frequent. Alas, the isolated node with large SSTable counts does not seem to be the only issue where commit logs break the limit. I'm tempted to open this as a separate issue, but let's see what you think first. In some cases, we see all 3 nodes in those small clusters break the limit at the same time. I will do better monitoring but I did manage to catch one in progress and here i observed. There was not a lot of blocked threads like before but it did have the MemtablePostFlusher blocked on the countdown latch. So here are the tpstats for that: {code} MemtableFlushWriter 830 7200 0 0 MemtablePostFlush 1 45879 16841 0 0 MemtableReclaimMemory 0 0 7199 0 0 {code} With 46K pending. The only thread I see for that is here: {code} "MemtablePostFlush:3" #3054 daemon prio=5 os_prio=0 tid=0x7f806fb71000 nid=0x2e5c waiting on condition [0x7f804366c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0005de8976f8> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I don't know who counts that latch down, but there were a couple of blocked threads, here: {code} "HintedHandoff:2" #1429 daemon prio=1 os_prio=4 tid=0x7f80895c4800 nid=0x1242 waiting for monitor entry [0x7f804321b000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.db.HintedHandOffManager.compact(HintedHandOffManager.java:267) - waiting to lock <0x0004e2e689a8> (a org.apache.cassandra.db.HintedHandOffManager) at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:561) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) "HintedHandoff:1" #1428 daemon prio=1 os_prio=4 tid=0x7f80895c3800 nid=0x1241 waiting for monitor entry [0x7f7838855000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.db.HintedHandOffManager.compact(HintedHandOffManager.java:267) - waiting to lock <0x0004e2e689a8> (a org.apache.cassandra.db.HintedHandOffManager) at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:561) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} and the lock was held here: {code} "HintedHandoffManager:1" #1430 daemon prio=1 os_prio=4 tid=0x7f808aaf1800 nid=0x1243 waiting on condition [0x7f8043423000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00060bdc0b98> (a java.util.concurrent.FutureTask) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960606#comment-14960606 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:47 PM: -- thanks [~krummas] see cfstats-clean.txt which i obfuscated and uploaded. we didn't actually name them CF001 ;-) For your convenience I grabbed the sstable counts > 500: SSTable count: 3454 SSTable count: 55392 <--- SSTable count: 687 was (Author: jeffery.griffith): thanks [~krummas] see cfstats-clean.txt which i obfuscated and uploaded. we didn't actually name them CF001 ;-) > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: cfstats-clean.txt > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960606#comment-14960606 ] Jeff Griffith commented on CASSANDRA-10515: --- thanks [~krummas] see cfstats-clean.txt which i obfuscated and uploaded. we didn't actually name them CF001 ;-) > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cfstats-clean.txt, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: cassandra.yaml > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960606#comment-14960606 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:51 PM: -- thanks [~krummas] see cfstats-clean.txt which i obfuscated and uploaded. we didn't actually name them CF001 ;-) For your convenience I grabbed the sstable counts > 500: SSTable count: 3454 SSTable count: 55392 <--- SSTable count: 687 Also, I've attached our cassandra.yaml was (Author: jeffery.griffith): thanks [~krummas] see cfstats-clean.txt which i obfuscated and uploaded. we didn't actually name them CF001 ;-) For your convenience I grabbed the sstable counts > 500: SSTable count: 3454 SSTable count: 55392 <--- SSTable count: 687 > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960606#comment-14960606 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:59 PM: -- thanks [~krummas] see cfstats-clean.txt which i obfuscated and uploaded. we didn't actually name them CF001 ;-) For your convenience I grabbed the sstable counts > 500: SSTable count: 3454 SSTable count: 55392 <--- indeed this is NOT the case on other nodes SSTable count: 687 Also, I've attached our cassandra.yaml was (Author: jeffery.griffith): thanks [~krummas] see cfstats-clean.txt which i obfuscated and uploaded. we didn't actually name them CF001 ;-) For your convenience I grabbed the sstable counts > 500: SSTable count: 3454 SSTable count: 55392 <--- SSTable count: 687 Also, I've attached our cassandra.yaml > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960663#comment-14960663 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 1:14 PM: - yes, we just upgraded from 2.0. would that explain the 50k+? i have checked and that is NOT the case on the other nodes. yes, they are balanced in terms of data (40 core machines with lots of memory). in this stage of our rollout to 2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only THREE nodes of the thirty are now exhibiting this behavior. for the first few days, several others did however they seem to have self corrected. I will go back and check for large sstable counts to see if that explains all of them. after this first stage, we'll be rolling out to the larger 24-node clusters but we are pausing here on the small clusters until we figure this out. was (Author: jeffery.griffith): yes, we just upgraded from 2.0. would that explain the 50k+? i have checked and that is NOT the case on the other nodes. yes, they are balanced in terms of data (40 core machines with lots of memory). in this stage of our rollout to 2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only THREE nodes of the thirty are now exhibiting this behavior. for the first few days, several others did however they seem to have self corrected. I will go back and check for large sstable counts to see if that explains all of them. after this first stage, we'll be rolling out to the larger 24-node customers but we are pausing here on the small clusters until we figure this out. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: C5commitLogIncrease.jpg [~krummas] I checked a different cluster for sstable counts. (C5commitLogIncrease.jpg) Here they all decided to break the limit at the same time. The largest sstable count in each is about 5K. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, > stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960663#comment-14960663 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 1:16 PM: - yes, we just upgraded from 2.0. would that explain the 50k+? i have checked and that is NOT the case on the other nodes. yes, they are balanced in terms of data (40 core machines with lots of memory). in this stage of our rollout to 2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only THREE nodes of the thirty are now exhibiting this behavior. for the first few days, several others did however they seem to have self corrected. these three still have not. I will go back and check for large sstable counts to see if that explains all of them. after this first stage, we'll be rolling out to the larger 24-node clusters but we are pausing here on the small clusters until we figure this out. was (Author: jeffery.griffith): yes, we just upgraded from 2.0. would that explain the 50k+? i have checked and that is NOT the case on the other nodes. yes, they are balanced in terms of data (40 core machines with lots of memory). in this stage of our rollout to 2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only THREE nodes of the thirty are now exhibiting this behavior. for the first few days, several others did however they seem to have self corrected. I will go back and check for large sstable counts to see if that explains all of them. after this first stage, we'll be rolling out to the larger 24-node clusters but we are pausing here on the small clusters until we figure this out. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960755#comment-14960755 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 2:07 PM: - [~krummas] I checked a different cluster for sstable counts. (see C5commitLogIncrease.jpg) Here they all decided to break the limit at the same time. The largest sstable count in each is about 5K. was (Author: jeffery.griffith): [~krummas] I checked a different cluster for sstable counts. (C5commitLogIncrease.jpg) Here they all decided to break the limit at the same time. The largest sstable count in each is about 5K. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, > stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960643#comment-14960643 ] Jeff Griffith commented on CASSANDRA-10515: --- No, that is one node. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960650#comment-14960650 ] Jeff Griffith commented on CASSANDRA-10515: --- Indeed the symptoms here look like the other jira you mentioned. I have followed the thread dumps over time and it looks very much like it's spending a lot of time in the "overlapping" calculation as you see above. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960663#comment-14960663 ] Jeff Griffith commented on CASSANDRA-10515: --- yes, we just upgraded from 2.0. would that explain the 50k+? i have checked and that is NOT the case on the other nodes. yes, they are balanced in terms of data (40 core machines with lots of memory). in this stage of our rollout to 2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only THREE nodes of the thirty are now exhibiting this behavior. for the first few days, several others did however they seem to have self corrected. I will go back and check for large sstable counts to see if that explains all of them. after this first stage, we'll be rolling out to the larger 24-node customers but we are pausing here on the small clusters until we figure this out. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960663#comment-14960663 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 1:17 PM: - yes, we just upgraded from 2.0. would that explain the 50k+? i will dig deeper on that CF. i have checked and that is NOT the case on the other nodes. yes, they are balanced in terms of data (40 core machines with lots of memory). in this stage of our rollout to 2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only THREE nodes of the thirty are now exhibiting this behavior. for the first few days, several others did however they seem to have self corrected. these three still have not. I will go back and check for large sstable counts to see if that explains all of them. after this first stage, we'll be rolling out to the larger 24-node clusters but we are pausing here on the small clusters until we figure this out. was (Author: jeffery.griffith): yes, we just upgraded from 2.0. would that explain the 50k+? i have checked and that is NOT the case on the other nodes. yes, they are balanced in terms of data (40 core machines with lots of memory). in this stage of our rollout to 2.1, we have gone to 10 small clusters of 3 nodes each (30 nodes total). only THREE nodes of the thirty are now exhibiting this behavior. for the first few days, several others did however they seem to have self corrected. these three still have not. I will go back and check for large sstable counts to see if that explains all of them. after this first stage, we'll be rolling out to the larger 24-node clusters but we are pausing here on the small clusters until we figure this out. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960678#comment-14960678 ] Jeff Griffith commented on CASSANDRA-10515: --- great, thanks [~krummas]. please let me know if there is any more information i can provide to help resolve it. i'll get more info across the 30 nodes on large sstables and make sure this correlates to the problem. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960763#comment-14960763 ] Jeff Griffith commented on CASSANDRA-10515: --- No, this is only the beginning (300M users and a petabyte to go :) ) . We were kind of pausing here since the bigger clusters carry so many more users but we can definitely do that if we move forward with the rollout. We'll apply the releveling where can and see how it behaviors. is the 5K sstable count enough to be concerned about? i'll do some more analysis on these and compare to clusters that have not yet been upgraded. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, > stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960763#comment-14960763 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 2:17 PM: - No, this is only the beginning (300M users and a petabyte to go :) ) . We were kind of pausing here since the bigger clusters carry so many more users but we can definitely do that if we move forward with the rollout. We'll apply the releveling where can and see how it behaves. is the 5K sstable count enough to be concerned about? i'll do some more analysis on these and compare to clusters that have not yet been upgraded. was (Author: jeffery.griffith): No, this is only the beginning (300M users and a petabyte to go :) ) . We were kind of pausing here since the bigger clusters carry so many more users but we can definitely do that if we move forward with the rollout. We'll apply the releveling where can and see how it behaviors. is the 5K sstable count enough to be concerned about? i'll do some more analysis on these and compare to clusters that have not yet been upgraded. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, > stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960769#comment-14960769 ] Jeff Griffith commented on CASSANDRA-10515: --- ok, that's good news then. we'll apply tools/bin/sstableofflinerelevel to everything above 1K or so? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, > stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960769#comment-14960769 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 2:20 PM: - ok, that's good news then although the weird synchronization of that last example concerns me. we'll apply tools/bin/sstableofflinerelevel to everything above 1K or so? was (Author: jeffery.griffith): ok, that's good news then. we'll apply tools/bin/sstableofflinerelevel to everything above 1K or so? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, > CommitLogSize.jpg, RUN3tpstats.jpg, cassandra.yaml, cfstats-clean.txt, > stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:00 AM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow. Eventually a second MemtableFlushWriter thread blocks. I believe if I left it continue to run, all or many of them will. {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was (Author: jeffery.griffith): [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow: {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 11:57 PM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow: {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was (Author: jeffery.griffith): [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed: {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: RUN3tpstats.jpg [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed: {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, > RUN3tpstats.jpg, stacktrace.txt, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:05 AM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow. Eventually a second MemtableFlushWriter thread blocks. I believe if I left it continue to run, all or many of them will. {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I see one thread for MemtablePostFlush and this is it: {code} "MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 nid=0x2d93 waiting on condition [0x7fda4b46c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0005838ba468> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} was (Author: jeffery.griffith): [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:13 AM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow. Eventually a second MemtableFlushWriter thread blocks. I believe if I left it continue to run, all or many of them will. {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I see one thread for MemtablePostFlush and this is it: {code} "MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 nid=0x2d93 waiting on condition [0x7fda4b46c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0005838ba468> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I followed it for a while longer after this and it really looks like the post flush stacks blocked on that latch forever: {code} 00:01 MemtableFlushWriter 2 2 2024 0 0 MemtablePostFlush 1 47159 4277 0 0 MemtableReclaimMemory 0 0 2024 0 0 00:03 MemtableFlushWriter 3 3 2075 0 0 MemtablePostFlush
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959885#comment-14959885 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/16/15 12:14 AM: -- [~tjake] I monitored live for a few hours to capture the behavior. See RUN3tpstats.jpg in the attachments: Overview is: Monitoring threads began to block before the memtable flushing did. Memtable flushing seemed to be progressing slowly and then post flushing operations began to pile up. The primary things blocked were: 1. MemtableFlushWriter/handleNotif 2. CompactionExec/getNextBGTask 3. ServiceThread/getEstimatedRemTask Those three blocked and never came unblocked so assume (?) the locker never completed or was very, very slow. Eventually a second MemtableFlushWriter thread blocks. I believe if I left it continue to run, all or many of them will. {code} "CompactionExecutor:18" #1462 daemon prio=1 os_prio=4 tid=0x7fd96141 nid=0x728b runnable [0x7fda4ae0b000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004a8bc5038> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004a8af17d0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004a894df10> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I see one thread for MemtablePostFlush and this is it: {code} "MemtablePostFlush:8" #4866 daemon prio=5 os_prio=0 tid=0x7fd91c0c5800 nid=0x2d93 waiting on condition [0x7fda4b46c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0005838ba468> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:998) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I followed it for a while longer after this and it really looks like the post flush stays blocked on that latch forever: {code} 00:01 MemtableFlushWriter 2 2 2024 0 0 MemtablePostFlush 1 47159 4277 0 0 MemtableReclaimMemory 0 0 2024 0 0 00:03 MemtableFlushWriter 3 3 2075 0 0 MemtablePostFlush
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959525#comment-14959525 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 8:06 PM: - Yeah doesn't look like the locking thread is deadlocked at all. I know this is a stretch, but considering we just migrated from 2.0.x, could there be something data specific that is confusing the compaction? Not sure where to check for slow flushes. Should i just watch tpstats? was (Author: jeffery.griffith): Yeah doesn't look blocked. How can i check for the slow flushes? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959608#comment-14959608 ] Jeff Griffith commented on CASSANDRA-10515: --- I had restarted but I'll watch live the next iteration. As you see upwards in the comments though, they do start piling up: MemtableFlushWriter 1 1 1574 0 0 MemtablePostFlush 1 13755 134889 0 0 MemtableReclaimMemory 0 0 1574 0 0 In the previous iteration, there were four threads for MemtableFlushWriter all blocked behind the runnable LeveledManifest.getCandidatesFor(LeveledManifest.java:572) > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959525#comment-14959525 ] Jeff Griffith commented on CASSANDRA-10515: --- Yeah doesn't look blocked. How can i check for the slow flushes? > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959218#comment-14959218 ] Jeff Griffith commented on CASSANDRA-10515: --- btw, we tried commitlog_segment_recycling: false but we realized after this should already be the default. we briefly thought it made a difference after restart that node but the problem did return after several hours. there is some mention in another jira about tuning the number of memtable flush writers. could this be an issue? it's still difficult to explain though why we only see this in a few nodes in the ten clusters all with the same config. will try to get the thread dump asap. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959208#comment-14959208 ] Jeff Griffith commented on CASSANDRA-10515: --- working on it [~mishail] > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959498#comment-14959498 ] Jeff Griffith commented on CASSANDRA-10515: --- A second iteration. Ran into a second instance of metrics via RMI but caught it very early when only a few were blocked behind the compaction. Still looks like the same general place: {code} "CompactionExecutor:16" #1502 daemon prio=1 os_prio=4 tid=0x7fb78c4f2000 nid=0xf7ff runnable [0x7fb751941000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.putVal(HashMap.java:641) at java.util.HashMap.put(HashMap.java:611) at java.util.HashSet.add(HashSet.java:219) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:512) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x0004bcf24298> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004bcbec488> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x0004b98f1b00> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959313#comment-14959313 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 6:08 PM: - Oh look, the memtable flusher is blocked on the same lock: {code} "MemtableFlushWriter:1166" #18316 daemon prio=5 os_prio=0 tid=0x7f33ac5f8800 nid=0xb649 waiting for monitor entry [0x7f31c5acc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:250) - waiting to lock <0x000498151af8> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518) {code} I don't know how hot that particular code is but every stacktrace showed the lock at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) or deeper. was (Author: jeffery.griffith): Oh look, the memtable flusher is blocked on the same lock: {code} "MemtableFlushWriter:1166" #18316 daemon prio=5 os_prio=0 tid=0x7f33ac5f8800 nid=0xb649 waiting for monitor entry [0x7f31c5acc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:250) - waiting to lock <0x000498151af8> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518) {code} > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959279#comment-14959279 ] Jeff Griffith commented on CASSANDRA-10515: --- [~mishail] [~blambov] thread dump attached. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959301#comment-14959301 ] Jeff Griffith commented on CASSANDRA-10515: --- I followed the thead dumps over time. a lot of metrics (getting estimated pending tasks) blocked behind this thread: {code} "CompactionExecutor:11" #1591 daemon prio=1 os_prio=4 tid=0x7f30f1338800 nid=0xba6b runnable [0x7f2e75bfd000] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Bounds.contains(Bounds.java:49) at org.apache.cassandra.dht.Bounds.intersects(Bounds.java:77) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:511) at org.apache.cassandra.db.compaction.LeveledManifest.overlapping(LeveledManifest.java:497) at org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:572) at org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:346) - locked <0x000498f172b0> (a org.apache.cassandra.db.compaction.LeveledManifest) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getMaximalTask(LeveledCompactionStrategy.java:101) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:90) - locked <0x0004989bb5c0> (a org.apache.cassandra.db.compaction.LeveledCompactionStrategy) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) - locked <0x000498151af8> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959313#comment-14959313 ] Jeff Griffith commented on CASSANDRA-10515: --- Oh look, the memtable flusher is blocked on the same lock: {code} "MemtableFlushWriter:1166" #18316 daemon prio=5 os_prio=0 tid=0x7f33ac5f8800 nid=0xb649 waiting for monitor entry [0x7f31c5acc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:250) - waiting to lock <0x000498151af8> (a org.apache.cassandra.db.compaction.WrappingCompactionStrategy) at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518) {code} > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959338#comment-14959338 ] Jeff Griffith commented on CASSANDRA-10515: --- Yeah, i saw all the blocked threads behind it. checking to see what monitoring tools are not checking for the previous instance to finish. but this is just an ugly side effect, isn't it? (a side effect of lock?) > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959338#comment-14959338 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 7:19 PM: - Yeah, i saw all the blocked threads behind it. checking to see what monitoring tools are not checking for the previous instance to finish. but this is just an ugly side effect, isn't it? (a side effect of lock?) i will disable all monitoring and restart to be sure. (UPDATE: looks like a cron job piled those up after things got stuck. i disabled it to be sure.) was (Author: jeffery.griffith): Yeah, i saw all the blocked threads behind it. checking to see what monitoring tools are not checking for the previous instance to finish. but this is just an ugly side effect, isn't it? (a side effect of lock?) i will disable all monitoring and restart to be sure. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: stacktrace.txt > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959338#comment-14959338 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/15/15 6:18 PM: - Yeah, i saw all the blocked threads behind it. checking to see what monitoring tools are not checking for the previous instance to finish. but this is just an ugly side effect, isn't it? (a side effect of lock?) i will disable all monitoring and restart to be sure. was (Author: jeffery.griffith): Yeah, i saw all the blocked threads behind it. checking to see what monitoring tools are not checking for the previous instance to finish. but this is just an ugly side effect, isn't it? (a side effect of lock?) > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, stacktrace.txt, > system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957289#comment-14957289 ] Jeff Griffith commented on CASSANDRA-10515: --- A test tried: lowered max commit log size to 6g from 12g. It respected the limit through 8 iterations then began to grow again (same behavior). > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: system.log.clean I cleaned up a log and attached it for a period that begin already in trouble, but suddenly started (flushing?) and compacting again. At 11:25 there was a huge compaction. Things seemed normal for a while but the commit logs began to grow again. Note that I have cleaned out a bunch of warnings like this because we see them everywhere: WARN [SharedPool-Worker-4] 2015-10-14 11:21:22,940 BatchStatement.java:252 - Batch of prepared statements for [xx] is of size 5253, exceeding specified threshold of 5120 by 133. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: CommitLogSize.jpg This matches the period in the log file system.log.clean > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956966#comment-14956966 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/14/15 1:49 PM: - I cleaned up a log and attached it (system.log.clean) for a period that begin already in trouble, but suddenly started (flushing?) and compacting again. At 11:25 there was a huge compaction. Things seemed normal for a while but the commit logs began to grow again. Note that I have cleaned out a bunch of warnings like this because we see them everywhere: WARN [SharedPool-Worker-4] 2015-10-14 11:21:22,940 BatchStatement.java:252 - Batch of prepared statements for [xx] is of size 5253, exceeding specified threshold of 5120 by 133. was (Author: jeffery.griffith): I cleaned up a log and attached it for a period that begin already in trouble, but suddenly started (flushing?) and compacting again. At 11:25 there was a huge compaction. Things seemed normal for a while but the commit logs began to grow again. Note that I have cleaned out a bunch of warnings like this because we see them everywhere: WARN [SharedPool-Worker-4] 2015-10-14 11:21:22,940 BatchStatement.java:252 - Batch of prepared statements for [xx] is of size 5253, exceeding specified threshold of 5120 by 133. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956970#comment-14956970 ] Jeff Griffith edited comment on CASSANDRA-10515 at 10/14/15 1:49 PM: - CommitLogSize.jpg matches the period in the log file system.log.clean was (Author: jeffery.griffith): This matches the period in the log file system.log.clean > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Assignee: Branimir Lambov >Priority: Critical > Labels: commitlog, triage > Attachments: CommitLogProblem.jpg, CommitLogSize.jpg, system.log.clean > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more before it restarts. Once it reaches the state of more than 12G > commit log files, "nodetool compactionstats" hangs. Eventually C* restarts > without errors (not sure yet whether it is crashing but I'm checking into it) > and the cleanup occurs and the commit logs shrink back down again. Here is > the nodetool compactionstats immediately after restart. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Compaction hangs with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Attachment: CommitLogProblem.jpg > Compaction hangs with move to 2.1.10 > > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Priority: Critical > Attachments: CommitLogProblem.jpg > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. I > watched the recovery live when compactions begin happening again. the > "nodetool compactionstats" suddenly completed to show the outstanding jobs > most in 100% completion state: > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore ContactInformationUpdates 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore CommEvents 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore EndpointPrefixIndexMinimized6592197093 > 6592316682 bytes100.00% > Compaction SyncCore EmailHistogramDeltas3411039555 > 3411039557 bytes100.00% > Compaction SyncCoreContactPrefixBytesIndex2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore EndpointProfiles 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore CommEvents 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore EndpointIndexIntId3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10515) Compaction hangs with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Description: After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where some nodes break the 12G commit log max we configured and go as high as 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. I watched the recovery live when compactions begin happening again. the "nodetool compactionstats" suddenly completed to show the outstanding jobs most in 100% completion state: {code} jgriffith@prod1xc1.c2.bf1:~$ ndc pending tasks: 2185 compaction type keyspace table completed totalunit progress Compaction SyncCore *cf1* 61251208033 170643574558 bytes 35.89% Compaction SyncCore *cf2* 19262483904 19266079916 bytes 99.98% Compaction SyncCore *cf3*6592197093 6592316682 bytes100.00% Compaction SyncCore *cf4*3411039555 3411039557 bytes100.00% Compaction SyncCore *cf5*2879241009 2879487621 bytes 99.99% Compaction SyncCore *cf6* 21252493623 21252635196 bytes100.00% Compaction SyncCore *cf7* 81009853587 81009854438 bytes100.00% Compaction SyncCore *cf8*3005734580 3005768582 bytes100.00% Active compaction remaining time :n/a {code} I was also doing periodic "nodetool tpstats" which were working but not being logged in system.log on the StatusLogger thread until after the compaction started working again. was: After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where some nodes break the 12G commit log max we configured and go as high as 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. I watched the recovery live when compactions begin happening again. the "nodetool compactionstats" suddenly completed to show the outstanding jobs most in 100% completion state: {code} jgriffith@prod1xc1.c2.bf1:~$ ndc pending tasks: 2185 compaction type keyspace table completed totalunit progress Compaction SyncCore ContactInformationUpdates 61251208033 170643574558 bytes 35.89% Compaction SyncCore CommEvents 19262483904 19266079916 bytes 99.98% Compaction SyncCore EndpointPrefixIndexMinimized6592197093 6592316682 bytes100.00% Compaction SyncCore EmailHistogramDeltas3411039555 3411039557 bytes100.00% Compaction SyncCoreContactPrefixBytesIndex2879241009 2879487621 bytes 99.99% Compaction SyncCore EndpointProfiles 21252493623 21252635196 bytes100.00% Compaction SyncCore CommEvents 81009853587 81009854438 bytes100.00% Compaction SyncCore EndpointIndexIntId3005734580 3005768582 bytes100.00% Active compaction remaining time :n/a {code} I was also doing periodic "nodetool tpstats" which were working but not being logged in system.log on the StatusLogger thread until after the compaction started working again. > Compaction hangs with move to 2.1.10 > > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Priority: Critical > Attachments: CommitLogProblem.jpg > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. I > watched the recovery live when compactions begin happening again. the > "nodetool compactionstats" suddenly completed to show the outstanding jobs > most in 100% completion state: > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 >
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Description: After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where some nodes break the 12G commit log max we configured and go as high as 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. Eventually C* restarts without errors and the cleanup occurs and the commit logs shrink back down again. {code} jgriffith@prod1xc1.c2.bf1:~$ ndc pending tasks: 2185 compaction type keyspace table completed totalunit progress Compaction SyncCore *cf1* 61251208033 170643574558 bytes 35.89% Compaction SyncCore *cf2* 19262483904 19266079916 bytes 99.98% Compaction SyncCore *cf3*6592197093 6592316682 bytes100.00% Compaction SyncCore *cf4*3411039555 3411039557 bytes100.00% Compaction SyncCore *cf5*2879241009 2879487621 bytes 99.99% Compaction SyncCore *cf6* 21252493623 21252635196 bytes100.00% Compaction SyncCore *cf7* 81009853587 81009854438 bytes100.00% Compaction SyncCore *cf8*3005734580 3005768582 bytes100.00% Active compaction remaining time :n/a {code} I was also doing periodic "nodetool tpstats" which were working but not being logged in system.log on the StatusLogger thread until after the compaction started working again. was: After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where some nodes break the 12G commit log max we configured and go as high as 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. I watched the recovery live when compactions begin happening again. the "nodetool compactionstats" suddenly completed to show the outstanding jobs most in 100% completion state: {code} jgriffith@prod1xc1.c2.bf1:~$ ndc pending tasks: 2185 compaction type keyspace table completed totalunit progress Compaction SyncCore *cf1* 61251208033 170643574558 bytes 35.89% Compaction SyncCore *cf2* 19262483904 19266079916 bytes 99.98% Compaction SyncCore *cf3*6592197093 6592316682 bytes100.00% Compaction SyncCore *cf4*3411039555 3411039557 bytes100.00% Compaction SyncCore *cf5*2879241009 2879487621 bytes 99.99% Compaction SyncCore *cf6* 21252493623 21252635196 bytes100.00% Compaction SyncCore *cf7* 81009853587 81009854438 bytes100.00% Compaction SyncCore *cf8*3005734580 3005768582 bytes100.00% Active compaction remaining time :n/a {code} I was also doing periodic "nodetool tpstats" which were working but not being logged in system.log on the StatusLogger thread until after the compaction started working again. > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Priority: Critical > Attachments: CommitLogProblem.jpg > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. > Eventually C* restarts without errors and the cleanup occurs and the commit > logs shrink back down again. > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% >
[jira] [Updated] (CASSANDRA-10515) Commit logs back up with move to 2.1.10
[ https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Griffith updated CASSANDRA-10515: -- Summary: Commit logs back up with move to 2.1.10 (was: Compaction hangs with move to 2.1.10) > Commit logs back up with move to 2.1.10 > --- > > Key: CASSANDRA-10515 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: redhat 6.5, cassandra 2.1.10 >Reporter: Jeff Griffith >Priority: Critical > Attachments: CommitLogProblem.jpg > > > After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems > where some nodes break the 12G commit log max we configured and go as high as > 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. I > watched the recovery live when compactions begin happening again. the > "nodetool compactionstats" suddenly completed to show the outstanding jobs > most in 100% completion state: > {code} > jgriffith@prod1xc1.c2.bf1:~$ ndc > pending tasks: 2185 >compaction type keyspace table completed > totalunit progress > Compaction SyncCore *cf1* 61251208033 > 170643574558 bytes 35.89% > Compaction SyncCore *cf2* 19262483904 > 19266079916 bytes 99.98% > Compaction SyncCore *cf3*6592197093 > 6592316682 bytes100.00% > Compaction SyncCore *cf4*3411039555 > 3411039557 bytes100.00% > Compaction SyncCore *cf5*2879241009 > 2879487621 bytes 99.99% > Compaction SyncCore *cf6* 21252493623 > 21252635196 bytes100.00% > Compaction SyncCore *cf7* 81009853587 > 81009854438 bytes100.00% > Compaction SyncCore *cf8*3005734580 > 3005768582 bytes100.00% > Active compaction remaining time :n/a > {code} > I was also doing periodic "nodetool tpstats" which were working but not being > logged in system.log on the StatusLogger thread until after the compaction > started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10515) Compaction hangs with move to 2.1.10
Jeff Griffith created CASSANDRA-10515: - Summary: Compaction hangs with move to 2.1.10 Key: CASSANDRA-10515 URL: https://issues.apache.org/jira/browse/CASSANDRA-10515 Project: Cassandra Issue Type: Bug Components: Core Environment: redhat 6.5, cassandra 2.1.10 Reporter: Jeff Griffith Priority: Critical After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems where some nodes break the 12G commit log max we configured and go as high as 65G or more. Once it reaches this state, "nodetool compactionstats" hangs. I watched the recovery live when compactions begin happening again. the "nodetool compactionstats" suddenly completed to show the outstanding jobs most in 100% completion state: {code} jgriffith@prod1xc1.c2.bf1:~$ ndc pending tasks: 2185 compaction type keyspace table completed totalunit progress Compaction SyncCore ContactInformationUpdates 61251208033 170643574558 bytes 35.89% Compaction SyncCore CommEvents 19262483904 19266079916 bytes 99.98% Compaction SyncCore EndpointPrefixIndexMinimized6592197093 6592316682 bytes100.00% Compaction SyncCore EmailHistogramDeltas3411039555 3411039557 bytes100.00% Compaction SyncCoreContactPrefixBytesIndex2879241009 2879487621 bytes 99.99% Compaction SyncCore EndpointProfiles 21252493623 21252635196 bytes100.00% Compaction SyncCore CommEvents 81009853587 81009854438 bytes100.00% Compaction SyncCore EndpointIndexIntId3005734580 3005768582 bytes100.00% Active compaction remaining time :n/a {code} I was also doing periodic "nodetool tpstats" which were working but not being logged in system.log on the StatusLogger thread until after the compaction started working again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)