[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348394#comment-15348394 ] mlowicki commented on CASSANDRA-10992: -- We're using now C* 2.1.14 (for couple of weeks) and no hanging streaming sessions so far. > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar, db1.ams.jstack, > db6.analytics.jstack > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315855#comment-15315855 ] mlowicki commented on CASSANDRA-9935: - [~pauloricardomg] any ETA for 2.1.15 release? > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.15, 3.6, 3.0.6, 2.2.7 > > Attachments: 9935.patch, db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > [na:1.7.0_80] > at >
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303849#comment-15303849 ] mlowicki commented on CASSANDRA-10992: -- [~pauloricardomg] yes: {code} WARN [Thread-154755] 2016-05-26 17:36:20,625 CompressedInputStream.java:190 - Error while reading compressed input stream. WARN [STREAM-IN-/10.210.59.151] 2016-05-26 17:36:20,625 CompressedStreamReader.java:115 - [Stream 14d2bb50-2366-11e6-aff3-094ba808857e] Error while reading partition DecoratedKey(-8649238600224809230, 000933303034383932393204934600) from stream on ks='sync' and table='entity_by_id2'. WARN [Thread-156292] 2016-05-26 19:52:29,073 CompressedInputStream.java:190 - Error while reading compressed input stream. WARN [STREAM-IN-/10.210.59.84] 2016-05-26 19:52:29,073 CompressedStreamReader.java:115 - [Stream 040b4041-2379-11e6-a363-41a0407f7ce6] Error while reading partition DecoratedKey(-3970687134714418221, 000933303533393631373204000276d800) from stream on ks='sync' and table='entity_by_id2'. WARN [Thread-157643] 2016-05-26 23:17:09,393 CompressedInputStream.java:190 - Error while reading compressed input stream. WARN [STREAM-IN-/10.210.59.86] 2016-05-26 23:17:09,393 CompressedStreamReader.java:115 - [Stream 97753900-2395-11e6-b5a2-b9dde4344a60] Error while reading partition DecoratedKey(2694075662350043685, 00093238313135323204808800) from stream on ks='sync' and table='entity_by_id2'. {code} > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar, db1.ams.jstack, > db6.analytics.jstack > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-10992: - Attachment: db6.analytics.jstack db1.ams.jstack > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar, db1.ams.jstack, > db6.analytics.jstack > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288768#comment-15288768 ] mlowicki commented on CASSANDRA-10992: -- We've 3 datacenter (ams, lati and analytics which is virtual datacenter on OpenStack). I've observed that from the list of active streams in OpsCenter in each pair always one node is from OpenStack (analytics cluster) but as I've restarted all analytics nodes still there is lots of hanging sessions so it's not purely related to them. Attaching jstack output from two nodes. Also I've doubled timeout (to 2 hours) and will soon start new repair run. > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283451#comment-15283451 ] mlowicki commented on CASSANDRA-10992: -- Upgrade to 2.1.14 didn't helped. Still even almost 12h after end of repair run (using Cassandra Reaper) I've active streams (all with progress set to 100%). {{streaming_socket_timeout_in_ms}} has default value (360). > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179905#comment-15179905 ] mlowicki commented on CASSANDRA-10992: -- Repair finished successfully using Cassandra Reaper. It happened during the whole process (took couple of days) that Reaper terminated some sessions due to timeout (saw that in logs which live watching). > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179850#comment-15179850 ] mlowicki commented on CASSANDRA-10992: -- The same case after upgrade. Hanging streaming sessions visible in OpsCenter and returned by `nodetool netstats`. I've been waiting 2 hours after repair finished. > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171656#comment-15171656 ] mlowicki commented on CASSANDRA-10992: -- We've started rolling out upgrade today so within couple of days I should have some feedback. > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11174) org.apache.cassandra.metrics:type=Streaming,name=ActiveOutboundStreams is always zero
mlowicki created CASSANDRA-11174: Summary: org.apache.cassandra.metrics:type=Streaming,name=ActiveOutboundStreams is always zero Key: CASSANDRA-11174 URL: https://issues.apache.org/jira/browse/CASSANDRA-11174 Project: Cassandra Issue Type: Bug Environment: C* 2.1.12, Debian Wheezy Reporter: mlowicki Attachments: streams.png {{org.apache.cassandra.metrics:type=Streaming,name=TotalIncomingBytes}} and {{org.apache.cassandra.metrics:type=Streaming,name=TotalOutgoingBytes}} work fine but {{org.apache.cassandra.metrics:type=Streaming,name=ActiveOutboundStreams}} is always 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149380#comment-15149380 ] mlowicki commented on CASSANDRA-10992: -- We'll upgrade our cluster this or next week (have been waiting a bit after release to make sure no critical issues have been introduced). Will let you know here when done. > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10991) Cleanup OpsCenter keyspace fails - node thinks that didn't joined the ring yet
[ https://issues.apache.org/jira/browse/CASSANDRA-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106872#comment-15106872 ] mlowicki commented on CASSANDRA-10991: -- {code} cqlsh> desc keyspace "OpsCenter"; CREATE KEYSPACE "OpsCenter" WITH replication = {'class': 'NetworkTopologyStrategy', 'Amsterdam': '1', 'Ashburn': '1'} AND durable_writes = true; CREATE TABLE "OpsCenter".events_timeline ( key text, column1 bigint, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '{"info": "OpsCenter management data.", "version": [5, 2, 1]}' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.25 AND speculative_retry = 'NONE'; CREATE TABLE "OpsCenter".settings ( key blob, column1 blob, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (column1 ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '{"info": "OpsCenter management data.", "version": [5, 2, 1]}' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 1.0 AND speculative_retry = 'NONE'; ... {code} Ah I see that "Analytics" is missing in {{replication}}. > Cleanup OpsCenter keyspace fails - node thinks that didn't joined the ring yet > -- > > Key: CASSANDRA-10991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10991 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Marcus Eriksson > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > I've C* cluster spread across 3 DCs. Running {{cleanup}} on all nodes in one > DC always fails: > {code} > root@db1:~# nt cleanup system > root@db1:~# nt cleanup sync > root@db1:~# nt cleanup OpsCenter > Aborted cleaning up atleast one column family in keyspace OpsCenter, check > server logs for more information. > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:204) > root@db1:~# > {code} > Checked two other DCs and running cleanup there works fine (it didn't fail > immediately). > Output from {{nodetool status}} from one node in problematic DC: > {code} > root@db1:~# nt status > Datacenter: Amsterdam > = > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID > Rack > UN 10.210.3.162 518.54 GB 256 ? > 50e606f5-e893-4a3b-86d3-1e5986dceea9 RAC1 > UN 10.210.3.230 532.63 GB 256 ? > 7b8fc988-8a6a-4d94-ae84-ab9da9ab01e8 RAC1 > UN 10.210.3.161 538.82 GB 256 ? > d44b0f6d-7933-4a7c-ba7b-f8648e038f85 RAC1 > UN 10.210.3.160 497.6 GB 256 ? > e7332179-a47e-471d-bcd4-08c638ab9ea4 RAC1 > UN 10.210.3.224 334.25 GB 256 ? > 92b0bd8c-0a5a-446a-83ea-2feea4988fe3 RAC1 > UN 10.210.3.118 518.34 GB 256 ? > ebddeaf3-1433-4372-a4ca-9c7ba3d4a26b RAC1 > UN 10.210.3.221 516.57 GB 256 ? > 44d67a49-5310-4ab5-b448-a44be350abf5 RAC1 > UN 10.210.3.117 493.83 GB 256 ? > aae92956-82d6-421e-8f3f-22393ac7e5f7 RAC1 > Datacenter: Analytics > = > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID > Rack > UN 10.210.59.124 392.83 GB 320 ? > f770a8cc-b7bf-44ac-8cc0-214d9228dfcd RAC1 > UN 10.210.59.151 411.9 GB 320 ? > 3cc87422-0e43-4cd1-91bf-484f121be072 RAC1 > UN 10.210.58.132 309.8 GB 256 ? > 84d94d13-28d3-4b49-a3d9-557ab47e79b9 RAC1 > UN 10.210.58.133 281.82
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102147#comment-15102147 ] mlowicki commented on CASSANDRA-10992: -- [~pauloricardomg] CASSANDRA-10961 has been released for 2.1? We would need to replace our production cluster with attached build which is solid amount of work so if it'll be fixed in upcoming 2.1.x release then we would test patch after upgrading nodes. > Hanging streaming sessions > -- > > Key: CASSANDRA-10992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki >Assignee: Paulo Motta > Fix For: 2.1.12 > > Attachments: apache-cassandra-2.1.12-SNAPSHOT.jar > > > I've started recently running repair using [Cassandra > Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool > repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've > noticed hanging streaming sessions: > {code} > root@db1:~# date > Sat Jan 9 16:43:00 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > root@db1:~# date > Sat Jan 9 17:45:42 UTC 2016 > root@db1:~# nt netstats -H | grep total > Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB > total > Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total > Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB > total > Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total > Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB > total > Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total > Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 > MB total > Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total > Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB > total > Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB > total > {code} > Such sessions are left even when repair job is long time done (confirmed by > checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} > in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10992) Hanging streaming sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-10992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094919#comment-15094919 ] mlowicki commented on CASSANDRA-10992: -- Some IO errors I've found in logs: {code} ERROR [Thread-518762] 2016-01-12 14:36:11,130 CassandraDaemon.java:227 - Exception in thread Thread[Thread-518762,5,main] java.lang.RuntimeException: java.io.IOException: Connection timed out at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.12.jar:2.1.12] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66] Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_66] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_66] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_66] at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_66] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_66] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:59) ~[na:1.8.0_66] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109) ~[na:1.8.0_66] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_66] at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:178) ~[apache-cassandra-2.1.12.jar:2.1.12] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.12.jar:2.1.12] ... 1 common frames omitted {code} {code} ERROR [STREAM-IN-/10.210.58.133] 2016-01-12 15:01:39,450 StreamSession.java:505 - [Stream #193dd5c0-b93b-11e5-a713-8fe7d1d062ea] Streaming error occurred java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_66] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_66] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_66] at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_66] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_66] at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51) ~[apache-cassandra-2.1.12.jar:2.1.12] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250) ~[apache-cassandra-2.1.12.jar:2.1.12] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] INFO [STREAM-IN-/10.210.58.133] 2016-01-12 15:01:39,451 StreamResultFuture.java:180 - [Stream #193dd5c0-b93b-11e5-a713-8fe7d1d062ea] Session with /10.210.58.133 is complete WARN [STREAM-IN-/10.210.58.133] 2016-01-12 15:01:39,451 StreamResultFuture.java:207 - [Stream #193dd5c0-b93b-11e5-a713-8fe7d1d062ea] Stream failed {code} {code} ERROR [Thread-404196] 2016-01-12 14:44:05,532 CassandraDaemon.java:227 - Exception in thread Thread[Thread-404196,5,main] java.lang.RuntimeException: java.nio.channels.AsynchronousCloseException at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.12.jar:2.1.12] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66] Caused by: java.nio.channels.AsynchronousCloseException: null at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205) ~[na:1.8.0_66] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:407) ~[na:1.8.0_66] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:59) ~[na:1.8.0_66] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109) ~[na:1.8.0_66] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_66] at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:178) ~[apache-cassandra-2.1.12.jar:2.1.12] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.12.jar:2.1.12] ... 1 common frames omitted {code} {code} ERROR [STREAM-OUT-/10.210.3.224] 2016-01-12 14:44:12,114 StreamSession.java:505 - [Stream #e7af3850-b93a-11e5-bebc-2f019a24a954] Streaming error occurred java.io.IOException: Broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_66] at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:427) ~[na:1.8.0_66] at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:492) ~[na:1.8.0_66] at
[jira] [Created] (CASSANDRA-10991) Cleanup OpsCenter keyspace fails - node thinks that didn't joined the ring yet
mlowicki created CASSANDRA-10991: Summary: Cleanup OpsCenter keyspace fails - node thinks that didn't joined the ring yet Key: CASSANDRA-10991 URL: https://issues.apache.org/jira/browse/CASSANDRA-10991 Project: Cassandra Issue Type: Bug Environment: C* 2.1.12, Debian Wheezy Reporter: mlowicki Fix For: 2.1.12 I've C* cluster spread across 3 DCs. Running {{cleanup}} on all nodes in one DC always fails: {code} root@db1:~# nt cleanup system root@db1:~# nt cleanup sync root@db1:~# nt cleanup OpsCenter Aborted cleaning up atleast one column family in keyspace OpsCenter, check server logs for more information. error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:204) root@db1:~# {code} Checked two other DCs and running cleanup there works fine (it didn't fail immediately). Output from {{nodetool status}} from one node in problematic DC: {code} root@db1:~# nt status Datacenter: Amsterdam = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.210.3.162 518.54 GB 256 ? 50e606f5-e893-4a3b-86d3-1e5986dceea9 RAC1 UN 10.210.3.230 532.63 GB 256 ? 7b8fc988-8a6a-4d94-ae84-ab9da9ab01e8 RAC1 UN 10.210.3.161 538.82 GB 256 ? d44b0f6d-7933-4a7c-ba7b-f8648e038f85 RAC1 UN 10.210.3.160 497.6 GB 256 ? e7332179-a47e-471d-bcd4-08c638ab9ea4 RAC1 UN 10.210.3.224 334.25 GB 256 ? 92b0bd8c-0a5a-446a-83ea-2feea4988fe3 RAC1 UN 10.210.3.118 518.34 GB 256 ? ebddeaf3-1433-4372-a4ca-9c7ba3d4a26b RAC1 UN 10.210.3.221 516.57 GB 256 ? 44d67a49-5310-4ab5-b448-a44be350abf5 RAC1 UN 10.210.3.117 493.83 GB 256 ? aae92956-82d6-421e-8f3f-22393ac7e5f7 RAC1 Datacenter: Analytics = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.210.59.124 392.83 GB 320 ? f770a8cc-b7bf-44ac-8cc0-214d9228dfcd RAC1 UN 10.210.59.151 411.9 GB 320 ? 3cc87422-0e43-4cd1-91bf-484f121be072 RAC1 UN 10.210.58.132 309.8 GB 256 ? 84d94d13-28d3-4b49-a3d9-557ab47e79b9 RAC1 UN 10.210.58.133 281.82 GB 256 ? 02bd2d02-41c5-4193-81b0-dee434adb0da RAC1 UN 10.210.59.86 285.84 GB 256 ? bc6422ea-22e9-431a-ac16-c4c040f0c4e5 RAC1 UN 10.210.59.84 331.06 GB 256 ? a798e6b0-3a84-4ec2-82bb-8474086cb315 RAC1 UN 10.210.59.85 366.26 GB 256 ? 52699077-56cf-4c1e-b308-bf79a1644b7e RAC1 Datacenter: Ashburn === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.195.15.176 534.51 GB 256 ? c6ac22df-c43a-4b25-b3b5-5e12ce9c69da RAC1 UN 10.195.15.177 313.73 GB 256 ? eafa2a72-84a2-4cdc-a634-3c660acc6af8 RAC1 UN 10.195.15.163 470.92 GB 256 ? bcd2a534-94c4-4406-8d16-c1fc26b41844 RAC1 UN 10.195.15.162 539.82 GB 256 ? bb649cef-21de-4077-a35f-994319011a06 RAC1 UN 10.195.15.182 499.64 GB 256 ? 6ce2d14d-9fb8-4494-8e97-3add05bd35de RAC1 UN 10.195.15.167 508.48 GB 256 ? 6f359675-852a-4842-9ff2-bdc69e6b04a2 RAC1 UN 10.195.15.166 490.28 GB 256 ? 1ec5d0c5-e8bd-4973-96d9-523de91d08c5 RAC1 UN 10.195.15.183 447.78 GB 256 ? 824165b0-1f1b-40e8-9695-e2f596cb8611 RAC1 Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless {code} Logs from one of the nodes where {{cleanup}} fails: {code} INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:33,942 CompactionManager.java:388 - Cleanup cannot run before a node has joined the ring INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:33,970 CompactionManager.java:388 - Cleanup cannot run before a node has joined the ring INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:34,000 CompactionManager.java:388 - Cleanup cannot run before a node has joined the ring INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:34,027 CompactionManager.java:388 - Cleanup cannot run before a node has joined the ring INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:34,053 CompactionManager.java:388 - Cleanup cannot run before a node has joined the ring INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:34,082 CompactionManager.java:388 - Cleanup cannot run before a node has joined the ring INFO
[jira] [Created] (CASSANDRA-10992) Hanging streaming sessions
mlowicki created CASSANDRA-10992: Summary: Hanging streaming sessions Key: CASSANDRA-10992 URL: https://issues.apache.org/jira/browse/CASSANDRA-10992 Project: Cassandra Issue Type: Bug Environment: C* 2.1.12, Debian Wheezy Reporter: mlowicki Fix For: 2.1.12 I've started recently running repair using [Cassandra Reaper|https://github.com/spotify/cassandra-reaper] (built-in {{nodetool repair}} doesn't work for me - CASSANDRA-9935). It behaves fine but I've noticed hanging streaming sessions: {code} root@db1:~# date Sat Jan 9 16:43:00 UTC 2016 root@db1:~# nt netstats -H | grep total Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB total Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB total Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB total Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 MB total Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB total Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB total root@db1:~# date Sat Jan 9 17:45:42 UTC 2016 root@db1:~# nt netstats -H | grep total Receiving 5 files, 46.59 MB total. Already received 1 files, 11.32 MB total Sending 7 files, 46.28 MB total. Already sent 7 files, 46.28 MB total Receiving 6 files, 64.15 MB total. Already received 1 files, 12.14 MB total Sending 5 files, 61.15 MB total. Already sent 5 files, 61.15 MB total Receiving 4 files, 7.75 MB total. Already received 3 files, 7.58 MB total Sending 4 files, 4.29 MB total. Already sent 4 files, 4.29 MB total Receiving 12 files, 13.79 MB total. Already received 11 files, 7.66 MB total Sending 5 files, 15.32 MB total. Already sent 5 files, 15.32 MB total Receiving 8 files, 20.35 MB total. Already received 1 files, 13.63 MB total Sending 38 files, 125.34 MB total. Already sent 38 files, 125.34 MB total {code} Such sessions are left even when repair job is long time done (confirmed by checking Reaper's and Cassandra's logs). {{streaming_socket_timeout_in_ms}} in cassandra.yaml is set to default value (360). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10991) Cleanup OpsCenter keyspace fails - node thinks that didn't joined the ring yet
[ https://issues.apache.org/jira/browse/CASSANDRA-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090699#comment-15090699 ] mlowicki commented on CASSANDRA-10991: -- According to {{nodetool status}} and looking at metrics node joined the ring many weeks ago. > Cleanup OpsCenter keyspace fails - node thinks that didn't joined the ring yet > -- > > Key: CASSANDRA-10991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10991 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki > Fix For: 2.1.12 > > > I've C* cluster spread across 3 DCs. Running {{cleanup}} on all nodes in one > DC always fails: > {code} > root@db1:~# nt cleanup system > root@db1:~# nt cleanup sync > root@db1:~# nt cleanup OpsCenter > Aborted cleaning up atleast one column family in keyspace OpsCenter, check > server logs for more information. > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:204) > root@db1:~# > {code} > Checked two other DCs and running cleanup there works fine (it didn't fail > immediately). > Output from {{nodetool status}} from one node in problematic DC: > {code} > root@db1:~# nt status > Datacenter: Amsterdam > = > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID > Rack > UN 10.210.3.162 518.54 GB 256 ? > 50e606f5-e893-4a3b-86d3-1e5986dceea9 RAC1 > UN 10.210.3.230 532.63 GB 256 ? > 7b8fc988-8a6a-4d94-ae84-ab9da9ab01e8 RAC1 > UN 10.210.3.161 538.82 GB 256 ? > d44b0f6d-7933-4a7c-ba7b-f8648e038f85 RAC1 > UN 10.210.3.160 497.6 GB 256 ? > e7332179-a47e-471d-bcd4-08c638ab9ea4 RAC1 > UN 10.210.3.224 334.25 GB 256 ? > 92b0bd8c-0a5a-446a-83ea-2feea4988fe3 RAC1 > UN 10.210.3.118 518.34 GB 256 ? > ebddeaf3-1433-4372-a4ca-9c7ba3d4a26b RAC1 > UN 10.210.3.221 516.57 GB 256 ? > 44d67a49-5310-4ab5-b448-a44be350abf5 RAC1 > UN 10.210.3.117 493.83 GB 256 ? > aae92956-82d6-421e-8f3f-22393ac7e5f7 RAC1 > Datacenter: Analytics > = > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID > Rack > UN 10.210.59.124 392.83 GB 320 ? > f770a8cc-b7bf-44ac-8cc0-214d9228dfcd RAC1 > UN 10.210.59.151 411.9 GB 320 ? > 3cc87422-0e43-4cd1-91bf-484f121be072 RAC1 > UN 10.210.58.132 309.8 GB 256 ? > 84d94d13-28d3-4b49-a3d9-557ab47e79b9 RAC1 > UN 10.210.58.133 281.82 GB 256 ? > 02bd2d02-41c5-4193-81b0-dee434adb0da RAC1 > UN 10.210.59.86 285.84 GB 256 ? > bc6422ea-22e9-431a-ac16-c4c040f0c4e5 RAC1 > UN 10.210.59.84 331.06 GB 256 ? > a798e6b0-3a84-4ec2-82bb-8474086cb315 RAC1 > UN 10.210.59.85 366.26 GB 256 ? > 52699077-56cf-4c1e-b308-bf79a1644b7e RAC1 > Datacenter: Ashburn > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID > Rack > UN 10.195.15.176 534.51 GB 256 ? > c6ac22df-c43a-4b25-b3b5-5e12ce9c69da RAC1 > UN 10.195.15.177 313.73 GB 256 ? > eafa2a72-84a2-4cdc-a634-3c660acc6af8 RAC1 > UN 10.195.15.163 470.92 GB 256 ? > bcd2a534-94c4-4406-8d16-c1fc26b41844 RAC1 > UN 10.195.15.162 539.82 GB 256 ? > bb649cef-21de-4077-a35f-994319011a06 RAC1 > UN 10.195.15.182 499.64 GB 256 ? > 6ce2d14d-9fb8-4494-8e97-3add05bd35de RAC1 > UN 10.195.15.167 508.48 GB 256 ? > 6f359675-852a-4842-9ff2-bdc69e6b04a2 RAC1 > UN 10.195.15.166 490.28 GB 256 ? > 1ec5d0c5-e8bd-4973-96d9-523de91d08c5 RAC1 > UN 10.195.15.183 447.78 GB 256 ? > 824165b0-1f1b-40e8-9695-e2f596cb8611 RAC1 > Note: Non-system keyspaces don't have the same replication settings, > effective ownership information is meaningless > {code} > Logs from one of the nodes where {{cleanup}} fails: > {code} > INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:33,942 > CompactionManager.java:388 - Cleanup cannot run before a node has joined the > ring > INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:33,970 > CompactionManager.java:388 - Cleanup cannot run before a node has joined the > ring > INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:34,000 > CompactionManager.java:388 -
[jira] [Commented] (CASSANDRA-10991) Cleanup OpsCenter keyspace fails - node thinks that didn't joined the ring yet
[ https://issues.apache.org/jira/browse/CASSANDRA-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090702#comment-15090702 ] mlowicki commented on CASSANDRA-10991: -- Yes, all nodes in single DC have this problem. {{nodetool status}} looks the same in all DCs. > Cleanup OpsCenter keyspace fails - node thinks that didn't joined the ring yet > -- > > Key: CASSANDRA-10991 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10991 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.12, Debian Wheezy >Reporter: mlowicki > Fix For: 2.1.12 > > > I've C* cluster spread across 3 DCs. Running {{cleanup}} on all nodes in one > DC always fails: > {code} > root@db1:~# nt cleanup system > root@db1:~# nt cleanup sync > root@db1:~# nt cleanup OpsCenter > Aborted cleaning up atleast one column family in keyspace OpsCenter, check > server logs for more information. > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:204) > root@db1:~# > {code} > Checked two other DCs and running cleanup there works fine (it didn't fail > immediately). > Output from {{nodetool status}} from one node in problematic DC: > {code} > root@db1:~# nt status > Datacenter: Amsterdam > = > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID > Rack > UN 10.210.3.162 518.54 GB 256 ? > 50e606f5-e893-4a3b-86d3-1e5986dceea9 RAC1 > UN 10.210.3.230 532.63 GB 256 ? > 7b8fc988-8a6a-4d94-ae84-ab9da9ab01e8 RAC1 > UN 10.210.3.161 538.82 GB 256 ? > d44b0f6d-7933-4a7c-ba7b-f8648e038f85 RAC1 > UN 10.210.3.160 497.6 GB 256 ? > e7332179-a47e-471d-bcd4-08c638ab9ea4 RAC1 > UN 10.210.3.224 334.25 GB 256 ? > 92b0bd8c-0a5a-446a-83ea-2feea4988fe3 RAC1 > UN 10.210.3.118 518.34 GB 256 ? > ebddeaf3-1433-4372-a4ca-9c7ba3d4a26b RAC1 > UN 10.210.3.221 516.57 GB 256 ? > 44d67a49-5310-4ab5-b448-a44be350abf5 RAC1 > UN 10.210.3.117 493.83 GB 256 ? > aae92956-82d6-421e-8f3f-22393ac7e5f7 RAC1 > Datacenter: Analytics > = > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID > Rack > UN 10.210.59.124 392.83 GB 320 ? > f770a8cc-b7bf-44ac-8cc0-214d9228dfcd RAC1 > UN 10.210.59.151 411.9 GB 320 ? > 3cc87422-0e43-4cd1-91bf-484f121be072 RAC1 > UN 10.210.58.132 309.8 GB 256 ? > 84d94d13-28d3-4b49-a3d9-557ab47e79b9 RAC1 > UN 10.210.58.133 281.82 GB 256 ? > 02bd2d02-41c5-4193-81b0-dee434adb0da RAC1 > UN 10.210.59.86 285.84 GB 256 ? > bc6422ea-22e9-431a-ac16-c4c040f0c4e5 RAC1 > UN 10.210.59.84 331.06 GB 256 ? > a798e6b0-3a84-4ec2-82bb-8474086cb315 RAC1 > UN 10.210.59.85 366.26 GB 256 ? > 52699077-56cf-4c1e-b308-bf79a1644b7e RAC1 > Datacenter: Ashburn > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens OwnsHost ID > Rack > UN 10.195.15.176 534.51 GB 256 ? > c6ac22df-c43a-4b25-b3b5-5e12ce9c69da RAC1 > UN 10.195.15.177 313.73 GB 256 ? > eafa2a72-84a2-4cdc-a634-3c660acc6af8 RAC1 > UN 10.195.15.163 470.92 GB 256 ? > bcd2a534-94c4-4406-8d16-c1fc26b41844 RAC1 > UN 10.195.15.162 539.82 GB 256 ? > bb649cef-21de-4077-a35f-994319011a06 RAC1 > UN 10.195.15.182 499.64 GB 256 ? > 6ce2d14d-9fb8-4494-8e97-3add05bd35de RAC1 > UN 10.195.15.167 508.48 GB 256 ? > 6f359675-852a-4842-9ff2-bdc69e6b04a2 RAC1 > UN 10.195.15.166 490.28 GB 256 ? > 1ec5d0c5-e8bd-4973-96d9-523de91d08c5 RAC1 > UN 10.195.15.183 447.78 GB 256 ? > 824165b0-1f1b-40e8-9695-e2f596cb8611 RAC1 > Note: Non-system keyspaces don't have the same replication settings, > effective ownership information is meaningless > {code} > Logs from one of the nodes where {{cleanup}} fails: > {code} > INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:33,942 > CompactionManager.java:388 - Cleanup cannot run before a node has joined the > ring > INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:33,970 > CompactionManager.java:388 - Cleanup cannot run before a node has joined the > ring > INFO [RMI TCP Connection(158004)-10.210.59.86] 2016-01-09 15:58:34,000 > CompactionManager.java:388 -
[jira] [Created] (CASSANDRA-10823) LEAK DETECTED (org.apache.cassandra.utils.concurrent.Ref$State@)
mlowicki created CASSANDRA-10823: Summary: LEAK DETECTED (org.apache.cassandra.utils.concurrent.Ref$State@) Key: CASSANDRA-10823 URL: https://issues.apache.org/jira/browse/CASSANDRA-10823 Project: Cassandra Issue Type: Bug Environment: C* 2.1.11, Debian Wheezy Reporter: mlowicki {code} ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,455 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@66909a93) to class org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup@529816960:/var/lib/cassandra/data2/sync/user_quota-fe54df20770e11e4a0a975bb514ae072/sync-user_quota-ka-61776-Index.db was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@45868eb2) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@84044743:[[OffHeapBitSet]] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@61f1d862) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1286945834:[[OffHeapBitSet]] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@e8110be) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@997339490:[[OffHeapBitSet]] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4608376b) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1164867000:[[OffHeapBitSet]] was not released before the reference was garbage collectedERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@56f2a6a4) to class org.apache.cassandra .utils.concurrent.WrappedSharedCloseable$1@1419412884:[[OffHeapBitSet]] was not released before the reference was garbage collectedERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@6cb7e2f0) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@479474259:[Memory@[0..4), Memory@[0..11)] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,457 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4573f5cd) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1074694490:[[OffHeapBitSet]] was not released before the reference was garbage collectedERROR [Reference-Reaper:1] 2015-12-07 14:09:30,457 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@7a5b9490) to class org.apache.cassandra .utils.concurrent.WrappedSharedCloseable$1@309770418:[[OffHeapBitSet]] was not released before the reference was garbage collectedERROR [Reference-Reaper:1] 2015-12-07 14:09:30,457 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3057b796) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1322643877:[[OffHeapBitSet]] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,498 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3febb012) to class org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@175410823:/var/lib/cassandra/data2/sync/entity2-e24b5040199b11e5a30f75bb514ae072/sync-entity2-tmplink-ka-1175811 was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,498 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@6a39466d) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1446958230:[[OffHeapBitSet]] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,499 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@36f6f016) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@235688075:[[OffHeapBitSet]] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,499 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4a7bdce1) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@165830139:[Memory@[0..4), Memory@[0..11)] was not
[jira] [Commented] (CASSANDRA-10823) LEAK DETECTED (org.apache.cassandra.utils.concurrent.Ref$State@)
[ https://issues.apache.org/jira/browse/CASSANDRA-10823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045717#comment-15045717 ] mlowicki commented on CASSANDRA-10823: -- [~tjake] it was while running drain so probably dupe of CASSANDRA-10079 I just found. > LEAK DETECTED (org.apache.cassandra.utils.concurrent.Ref$State@) > > > Key: CASSANDRA-10823 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10823 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11, Debian Wheezy >Reporter: mlowicki > > {code} > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,455 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@66909a93) to class > org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup@529816960:/var/lib/cassandra/data2/sync/user_quota-fe54df20770e11e4a0a975bb514ae072/sync-user_quota-ka-61776-Index.db > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@45868eb2) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@84044743:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@61f1d862) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1286945834:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@e8110be) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@997339490:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4608376b) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1164867000:[[OffHeapBitSet]] > was not released before the reference was garbage collectedERROR > [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK DETECTED: a > reference (org.apache.cassandra.utils.concurrent.Ref$State@56f2a6a4) to class > org.apache.cassandra > .utils.concurrent.WrappedSharedCloseable$1@1419412884:[[OffHeapBitSet]] was > not released before the reference was garbage collectedERROR > [Reference-Reaper:1] 2015-12-07 14:09:30,456 Ref.java:179 - LEAK DETECTED: a > reference (org.apache.cassandra.utils.concurrent.Ref$State@6cb7e2f0) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@479474259:[Memory@[0..4), > Memory@[0..11)] was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,457 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4573f5cd) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1074694490:[[OffHeapBitSet]] > was not released before the reference was garbage collectedERROR > [Reference-Reaper:1] 2015-12-07 14:09:30,457 Ref.java:179 - LEAK DETECTED: a > reference (org.apache.cassandra.utils.concurrent.Ref$State@7a5b9490) to class > org.apache.cassandra > .utils.concurrent.WrappedSharedCloseable$1@309770418:[[OffHeapBitSet]] was > not released before the reference was garbage collectedERROR > [Reference-Reaper:1] 2015-12-07 14:09:30,457 Ref.java:179 - LEAK DETECTED: a > reference (org.apache.cassandra.utils.concurrent.Ref$State@3057b796) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1322643877:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,498 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@3febb012) to class > org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@175410823:/var/lib/cassandra/data2/sync/entity2-e24b5040199b11e5a30f75bb514ae072/sync-entity2-tmplink-ka-1175811 > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,498 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@6a39466d) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1446958230:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-12-07 14:09:30,499 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@36f6f016) to class
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031827#comment-15031827 ] mlowicki commented on CASSANDRA-9935: - [~yukim]: any chance this is related to network issues? During the weekend I've monitored it carefully and repair failed at the same time I see drop in number of requests sent to C* cluster in this datacenter. I've decided to run repair for smaller tables where it takes 1-4 hours to complete and it happened once (launched on 6 nodes) also when such drop appears. Tried 2nd time and now it works (and I don't see any anomalies in metrics). > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030436#comment-15030436 ] mlowicki commented on CASSANDRA-9935: - Tried to run repair once again after online scrub and cleanup on all nodes. Failed with the same error. This is what I've found in logs: {code} ERROR [ValidationExecutor:1089] 2015-11-28 04:33:15,865 Validator.java:245 - Failed creating a merkle tree for [repair #0f9c5530-9589-11e5-b036-75bb514ae072 on sync/entity2, (-6842825601551036942,-6841068234348096268]], /10.210.3.221 (see log for details) ERROR [ValidationExecutor:1089] 2015-11-28 04:33:15,866 CassandraDaemon.java:227 - Exception in thread Thread[ValidationExecutor:1089,1,main] java.lang.AssertionError: row DecoratedKey(-6842806631972123001, 000932383331343239333204c3c700) received out of order wrt DecoratedKey(-6841074726771668561, 000932313637353230343404c3c700) at org.apache.cassandra.repair.Validator.add(Validator.java:127) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] ERROR [AntiEntropySessions:1957] 2015-11-28 04:33:15,868 RepairSession.java:303 - [repair #0f9c5530-9589-11e5-b036-75bb514ae072] session completed with the following error org.apache.cassandra.exceptions.RepairException: [repair #0f9c5530-9589-11e5-b036-75bb514ae072 on sync/entity2, (-6842825601551036942,-6841068234348096268]] Validation failed in /10.210.3.221 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} {code} ERROR [AntiEntropySessions:1957] 2015-11-28 04:33:15,869 CassandraDaemon.java:227 - Exception in thread Thread[AntiEntropySessions:1957,5,RMI Runtime] java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #0f9c5530-9589-11e5-b036-75bb514ae072 on sync/entity2, (-6842825601551036942,-6841068234348096268]] Validation failed in /10.210.3.221 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: org.apache.cassandra.exceptions.RepairException: [repair #0f9c5530-9589-11e5-b036-75bb514ae072 on sync/entity2, (-6842825601551036942,-6841068234348096268]] Validation failed in /10.210.3.221 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.11.jar:2.1.11] ... 3 common frames omitted {code} {code} ERROR
[jira] [Updated] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9935: Attachment: system.log.10.210.3.117 > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > [na:1.7.0_80] > at > org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) > ~[apache-cassandra-2.1.8.jar:2.1.8] > at
[jira] [Updated] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9935: Attachment: system.log.10.210.3.230 system.log.10.210.3.221 > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.221, > system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > [na:1.7.0_80] > at > org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030439#comment-15030439 ] mlowicki commented on CASSANDRA-9935: - Also If I run repair for range where got this "Endpoint X died" it works fine: {code} root@db1:~# time nodetool repair --in-local-dc -st 8066543735336862962 -et 8074446636728465478 [2015-11-28 08:55:19,048] Nothing to repair for keyspace 'system' [2015-11-28 08:55:19,069] Starting repair command #6, repairing 1 ranges for keyspace OpsCenter (parallelism=SEQUENTIAL, full=true) [2015-11-28 08:55:19,176] Repair command #6 finished [2015-11-28 08:55:19,188] Starting repair command #7, repairing 1 ranges for keyspace sync (parallelism=SEQUENTIAL, full=true) [2015-11-28 09:03:49,529] Repair session c054ec60-95ad-11e5-b036-75bb514ae072 for range (8066543735336862962,8074446636728465478] finished [2015-11-28 09:03:49,529] Repair command #7 finished [2015-11-28 09:03:49,544] Starting repair command #8, repairing 1 ranges for keyspace system_traces (parallelism=SEQUENTIAL, full=true) [2015-11-28 09:03:49,562] Repair command #8 finished real8m32.356s user0m2.784s sys 0m0.224s {code} > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR
[jira] [Created] (CASSANDRA-10782) AssertionError at getApproximateKeyCount
mlowicki created CASSANDRA-10782: Summary: AssertionError at getApproximateKeyCount Key: CASSANDRA-10782 URL: https://issues.apache.org/jira/browse/CASSANDRA-10782 Project: Cassandra Issue Type: Bug Environment: C* 2.1.11, Debian Wheezy Reporter: mlowicki {code} ERROR [CompactionExecutor:9797] 2015-11-28 09:20:10,361 CassandraDaemon.java:227 - Exception in thread Thread[CompactionExecutor:9797,1,main] java.lang.AssertionError: /var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-6335-Data.db at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:268) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) ~[apache-cassandra-2.1.11.jar:2.1.11]at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.11.jar:2.1.11]at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) ~[apache-cassandra-2.1.11.jar:2.1.11]at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80]at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10782) AssertionError at getApproximateKeyCount
[ https://issues.apache.org/jira/browse/CASSANDRA-10782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-10782: - Description: {code} ERROR [CompactionExecutor:9845] 2015-11-28 09:26:10,525 CassandraDaemon.java:227 - Exception in thread Thread[CompactionExecutor:9845,1,main] java.lang.AssertionError: /var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-6335-Data.db at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:268) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} was: {code} ERROR [CompactionExecutor:9797] 2015-11-28 09:20:10,361 CassandraDaemon.java:227 - Exception in thread Thread[CompactionExecutor:9797,1,main] java.lang.AssertionError: /var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-6335-Data.db at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:268) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) ~[apache-cassandra-2.1.11.jar:2.1.11]at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.11.jar:2.1.11]at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) ~[apache-cassandra-2.1.11.jar:2.1.11]at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80]at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} > AssertionError at getApproximateKeyCount > > > Key: CASSANDRA-10782 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10782 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11, Debian Wheezy >Reporter: mlowicki > > {code} > ERROR [CompactionExecutor:9845] 2015-11-28 09:26:10,525 > CassandraDaemon.java:227 - Exception in thread > Thread[CompactionExecutor:9845,1,main] > java.lang.AssertionError: > /var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-6335-Data.db > at > org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:268) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) >
[jira] [Created] (CASSANDRA-10780) Exception encountered during startup
mlowicki created CASSANDRA-10780: Summary: Exception encountered during startup Key: CASSANDRA-10780 URL: https://issues.apache.org/jira/browse/CASSANDRA-10780 Project: Cassandra Issue Type: Bug Environment: C* 2.1.11 on Debian Wheezy Reporter: mlowicki {code} ERROR [main] 2015-11-27 12:39:42,659 CassandraDaemon.java:579 - Exception encountered during startup org.apache.cassandra.io.FSReadError: java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:663) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:306) [apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562) [apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) [apache-cassandra-2.1.11.jar:2.1.11] Caused by: java.lang.NullPointerException: null at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:655) ~[apache-cassandra-2.1.11.jar:2.1.11] ... 3 common frames omitted {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10780) Exception encountered during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-10780: - Reproduced In: 2.1.11 > Exception encountered during startup > > > Key: CASSANDRA-10780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10780 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11 on Debian Wheezy >Reporter: mlowicki > > {code} > ERROR [main] 2015-11-27 12:39:42,659 CassandraDaemon.java:579 - Exception > encountered during startup > org.apache.cassandra.io.FSReadError: java.lang.NullPointerException > at > org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:663) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:306) > [apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562) > [apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) > [apache-cassandra-2.1.11.jar:2.1.11] > Caused by: java.lang.NullPointerException: null > at > org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:655) > ~[apache-cassandra-2.1.11.jar:2.1.11] > ... 3 common frames omitted > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028727#comment-15028727 ] mlowicki commented on CASSANDRA-10769: -- Another error from today (10.210.3.221 is node where I've started repair - still in progress): {code} ERROR [ValidationExecutor:588] 2015-11-26 12:48:02,877 Validator.java:245 - Failed creating a merkle tree for [repair #72d57040-943b-11e5-b036-75bb514ae072 on sync/entity2, (-2928915626059257529,-2921716383005026147]], /10.210.3.221 (see log for details) ERROR [ValidationExecutor:588] 2015-11-26 12:48:02,878 CassandraDaemon.java:227 - Exception in thread Thread[ValidationExecutor:588,1,main] java.lang.AssertionError: row DecoratedKey(-2928866306571865615, 000932383734343432313204b33100) received out of order wrt DecoratedKey(-2921918599167375595, 000933313439393634373204c3c700) at org.apache.cassandra.repair.Validator.add(Validator.java:127) ~[apache-cassandra-2.1.11.jar:2.1.11]at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.11.jar:2.1.11]at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] ERROR [AntiEntropySessions:1061] 2015-11-26 12:48:02,880 RepairSession.java:303 - [repair #72d57040-943b-11e5-b036-75bb514ae072] session completed with the following error org.apache.cassandra.exceptions.RepairException: [repair #72d57040-943b-11e5-b036-75bb514ae072 on sync/entity2, (-2928915626059257529,-2921716383005026147]] Validation failed in /10.210.3.221 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.11.jar:2.1.11]at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.11.jar:2.1.11]at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] INFO [AntiEntropySessions:1062] 2015-11-26 12:48:02,880 RepairSession.java:260 - [repair #ee6d25e0-943b-11e5-b036-75bb514ae072] new session: will sync /10.210.3.221, /10.210.3.224, /10.210.3.117 on range (-4713086263421125450,-4709745913912183602] for sync.[device_token, entity2, user_stats, user_device, user_quota, user_store, user_device_progress, entity_by_id2] ERROR [AntiEntropySessions:1061] 2015-11-26 12:48:02,881 CassandraDaemon.java:227 - Exception in thread Thread[AntiEntropySessions:1061,5,RMI Runtime] java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #72d57040-943b-11e5-b036-75bb514ae072 on sync/entity2, (-2928915626059257529,-2921716383005026147]] Validation failed in /10.210.3.221 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: org.apache.cassandra.exceptions.RepairException: [repair #72d57040-943b-11e5-b036-75bb514ae072 on sync/entity2, (-2928915626059257529,-2921716383005026147]] Validation failed in /10.210.3.221 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.11.jar:2.1.11] at
[jira] [Created] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
mlowicki created CASSANDRA-10769: Summary: "received out of order wrt DecoratedKey" after scrub Key: CASSANDRA-10769 URL: https://issues.apache.org/jira/browse/CASSANDRA-10769 Project: Cassandra Issue Type: Bug Environment: C* 2.1.11, Debian Wheezy Reporter: mlowicki After running scrub and cleanup on all nodes in single data center I'm getting: {code} ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,530 Validator.java:245 - Failed creating a merkle tree for [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, (-5867793819051725444,-5865919628027816979]], /10.210.3.221 (see log for details) ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,531 CassandraDaemon.java:227 - Exception in thread Thread[ValidationExecutor:103,1,main] java.lang.AssertionError: row DecoratedKey(-5867787467868737053, 000932373633313036313204808800) received out of order wrt DecoratedKey(-5865937851627253360, 000933313230313737333204c3c700) at org.apache.cassandra.repair.Validator.add(Validator.java:127) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} What I did is to run repair on other node: {code} time nodetool repair --in-local-dc {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026472#comment-15026472 ] mlowicki commented on CASSANDRA-10769: -- Found this error on other node as well: {code} ERROR [ValidationExecutor:78] 2015-11-24 22:35:52,652 Validator.java:245 - Failed creating a merkle tree for [repair #93837260-92fb-11e5-b036-75bb514ae072 on sync/entity2, (-6012485790753833422,-6009995015166063234]], /10.210.3.221 (see log for details) ERROR [ValidationExecutor:78] 2015-11-24 22:35:52,652 CassandraDaemon.java:227 - Exception in thread Thread[ValidationExecutor:78,1,main] java.lang.AssertionError: row DecoratedKey(-6012437544863914154, 000932373632373537303204c3c700) received out of order wrt DecoratedKey(-6009997709246787268, 000932373538333034303204c3c700) at org.apache.cassandra.repair.Validator.add(Validator.java:127) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} > "received out of order wrt DecoratedKey" after scrub > > > Key: CASSANDRA-10769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10769 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11, Debian Wheezy >Reporter: mlowicki > > After running scrub and cleanup on all nodes in single data center I'm > getting: > {code} > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,530 Validator.java:245 - > Failed creating a merkle tree for [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]], /10.210.3.221 (see log for > details) > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,531 > CassandraDaemon.java:227 - Exception in thread > Thread[ValidationExecutor:103,1,main] > java.lang.AssertionError: row DecoratedKey(-5867787467868737053, > 000932373633313036313204808800) received out of order wrt > DecoratedKey(-5865937851627253360, 000933313230313737333204c3c700) > at org.apache.cassandra.repair.Validator.add(Validator.java:127) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} > What I did is to run repair on other node: > {code} > time nodetool repair --in-local-dc > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-10769: - Description: After running scrub and cleanup on all nodes in single data center I'm getting: {code} ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,530 Validator.java:245 - Failed creating a merkle tree for [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, (-5867793819051725444,-5865919628027816979]], /10.210.3.221 (see log for details) ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,531 CassandraDaemon.java:227 - Exception in thread Thread[ValidationExecutor:103,1,main] java.lang.AssertionError: row DecoratedKey(-5867787467868737053, 000932373633313036313204808800) received out of order wrt DecoratedKey(-5865937851627253360, 000933313230313737333204c3c700) at org.apache.cassandra.repair.Validator.add(Validator.java:127) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} What I did is to run repair on other node: {code} time nodetool repair --in-local-dc {code} Corresponding log on the node where repair has been started: {code} ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 RepairSession.java:303 - [repair #89fa2b70-933d-11e5-b036-75bb514ae072] session completed with the following error org.apache.cassandra.exceptions.RepairException: [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, (-5867793819051725444,-5865919628027816979]] Validation failed in /10.210.3.117 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] INFO [AntiEntropySessions:415] 2015-11-25 06:28:21,533 RepairSession.java:260 - [repair #b9458fa0-933d-11e5-b036-75bb514ae072] new session: will sync /10.210.3.221, /10.210.3.118, /10.210.3.117 on range (7119703141488009983,7129744584776466802] for sync.[device_token, entity2, user_stats, user_device, user_quota, user_store, user_device_progress, entity_by_id2] ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 CassandraDaemon.java:227 - Exception in thread Thread[AntiEntropySessions:414,5,RMI Runtime] java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, (-5867793819051725444,-5865919628027816979]] Validation failed in /10.210.3.117 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: org.apache.cassandra.exceptions.RepairException: [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, (-5867793819051725444,-5865919628027816979]] Validation failed in /10.210.3.117 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.11.jar:2.1.11] at
[jira] [Updated] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-10769: - Reproduced In: 2.1.11 > "received out of order wrt DecoratedKey" after scrub > > > Key: CASSANDRA-10769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10769 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11, Debian Wheezy >Reporter: mlowicki > > After running scrub and cleanup on all nodes in single data center I'm > getting: > {code} > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,530 Validator.java:245 - > Failed creating a merkle tree for [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]], /10.210.3.221 (see log for > details) > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,531 > CassandraDaemon.java:227 - Exception in thread > Thread[ValidationExecutor:103,1,main] > java.lang.AssertionError: row DecoratedKey(-5867787467868737053, > 000932373633313036313204808800) received out of order wrt > DecoratedKey(-5865937851627253360, 000933313230313737333204c3c700) > at org.apache.cassandra.repair.Validator.add(Validator.java:127) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} > What I did is to run repair on other node: > {code} > time nodetool repair --in-local-dc > {code} > Corresponding log on the node where repair has been started: > {code} > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > RepairSession.java:303 - [repair #89fa2b70-933d-11e5-b036-75bb514ae072] > session completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > INFO [AntiEntropySessions:415] 2015-11-25 06:28:21,533 > RepairSession.java:260 - [repair #b9458fa0-933d-11e5-b036-75bb514ae072] new > session: will sync /10.210.3.221, /10.210.3.118, /10.210.3.117 on range > (7119703141488009983,7129744584776466802] for sync.[device_token, entity2, > user_stats, user_device, user_quota, user_store, user_device_progress, > entity_by_id2] > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > CassandraDaemon.java:227 - Exception in thread > Thread[AntiEntropySessions:414,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at >
[jira] [Commented] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026601#comment-15026601 ] mlowicki commented on CASSANDRA-10769: -- Yeah, found CASSANDRA-9126 as well but decided to file a separate ticket as scrub didn't helped in my case. > "received out of order wrt DecoratedKey" after scrub > > > Key: CASSANDRA-10769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10769 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11, Debian Wheezy >Reporter: mlowicki > > After running scrub and cleanup on all nodes in single data center I'm > getting: > {code} > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,530 Validator.java:245 - > Failed creating a merkle tree for [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]], /10.210.3.221 (see log for > details) > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,531 > CassandraDaemon.java:227 - Exception in thread > Thread[ValidationExecutor:103,1,main] > java.lang.AssertionError: row DecoratedKey(-5867787467868737053, > 000932373633313036313204808800) received out of order wrt > DecoratedKey(-5865937851627253360, 000933313230313737333204c3c700) > at org.apache.cassandra.repair.Validator.add(Validator.java:127) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} > What I did is to run repair on other node: > {code} > time nodetool repair --in-local-dc > {code} > Corresponding log on the node where repair has been started: > {code} > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > RepairSession.java:303 - [repair #89fa2b70-933d-11e5-b036-75bb514ae072] > session completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > INFO [AntiEntropySessions:415] 2015-11-25 06:28:21,533 > RepairSession.java:260 - [repair #b9458fa0-933d-11e5-b036-75bb514ae072] new > session: will sync /10.210.3.221, /10.210.3.118, /10.210.3.117 on range > (7119703141488009983,7129744584776466802] for sync.[device_token, entity2, > user_stats, user_device, user_quota, user_store, user_device_progress, > entity_by_id2] > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > CassandraDaemon.java:227 - Exception in thread > Thread[AntiEntropySessions:414,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at
[jira] [Commented] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026657#comment-15026657 ] mlowicki commented on CASSANDRA-10769: -- I'm struggling with CASSANDRA-9935. > "received out of order wrt DecoratedKey" after scrub > > > Key: CASSANDRA-10769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10769 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11, Debian Wheezy >Reporter: mlowicki > > After running scrub and cleanup on all nodes in single data center I'm > getting: > {code} > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,530 Validator.java:245 - > Failed creating a merkle tree for [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]], /10.210.3.221 (see log for > details) > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,531 > CassandraDaemon.java:227 - Exception in thread > Thread[ValidationExecutor:103,1,main] > java.lang.AssertionError: row DecoratedKey(-5867787467868737053, > 000932373633313036313204808800) received out of order wrt > DecoratedKey(-5865937851627253360, 000933313230313737333204c3c700) > at org.apache.cassandra.repair.Validator.add(Validator.java:127) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} > What I did is to run repair on other node: > {code} > time nodetool repair --in-local-dc > {code} > Corresponding log on the node where repair has been started: > {code} > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > RepairSession.java:303 - [repair #89fa2b70-933d-11e5-b036-75bb514ae072] > session completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > INFO [AntiEntropySessions:415] 2015-11-25 06:28:21,533 > RepairSession.java:260 - [repair #b9458fa0-933d-11e5-b036-75bb514ae072] new > session: will sync /10.210.3.221, /10.210.3.118, /10.210.3.117 on range > (7119703141488009983,7129744584776466802] for sync.[device_token, entity2, > user_stats, user_device, user_quota, user_store, user_device_progress, > entity_by_id2] > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > CassandraDaemon.java:227 - Exception in thread > Thread[AntiEntropySessions:414,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at >
[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025432#comment-15025432 ] mlowicki edited comment on CASSANDRA-9935 at 11/24/15 9:23 PM: --- Did found these session IDs on other nodes: * https://www.dropbox.com/s/qtx5rzmqzl9zj47/Screenshot%202015-11-24%2022.22.03.png?dl=0 * https://www.dropbox.com/s/o7k0cfhscd1au50/Screenshot%202015-11-24%2022.22.19.png?dl=0 was (Author: mlowicki): Did found these session IDs on other nodes. > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025314#comment-15025314 ] mlowicki commented on CASSANDRA-9935: - Launched repair and got the same exception after couple of days but grepped through logs and found: {code} ERROR [Thread-7155] 2015-11-24 17:38:24,895 StorageService.java:2999 - Repair session 3c9f7d40-8e19-11e5-bda4-0d9c8928349f for range (-1741218705797202342,-1741060704162047213] failed with error java.io.IOException: Failed during snapshot creation. java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation. at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2990) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation. at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_80] ... 1 common frames omitted Caused by: java.io.IOException: Failed during snapshot creation. at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:146) ~[apache-cassandra-2.1.11.jar:2.1.11] at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na] ... 3 common frames omitted {code} Additionally: {code} ERROR [Thread-7155] 2015-11-24 17:38:24,907 StorageService.java:2999 - Repair session b55b4930-8e73-11e5-bda4-0d9c8928349f for range (5801873202797297113,5802832998541920530] failed with error org.apache.cassandra.exceptions.RepairException: [repair #b55b4930-8e73-11e5-bda4-0d9c8928349f on sync/entity2, (5801873202797297113,5802832998541920530]] Validation failed in /10.195.15.167 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b55b4930-8e73-11e5-bda4-0d9c8928349f on sync/entity2, (5801873202797297113,5802832998541920530]] Validation failed in /10.195.15.167 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2990) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #b55b4930-8e73-11e5-bda4-0d9c8928349f on sync/entity2, (5801873202797297113,5802832998541920530]] Validation failed in /10.195.15.167 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_80] ... 1 common frames omitted Caused by: org.apache.cassandra.exceptions.RepairException: [repair #b55b4930-8e73-11e5-bda4-0d9c8928349f on sync/entity2,
[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025432#comment-15025432 ] mlowicki edited comment on CASSANDRA-9935 at 11/24/15 9:25 PM: --- Didn't found these session IDs on other nodes: * https://www.dropbox.com/s/qtx5rzmqzl9zj47/Screenshot%202015-11-24%2022.22.03.png?dl=0 * https://www.dropbox.com/s/o7k0cfhscd1au50/Screenshot%202015-11-24%2022.22.19.png?dl=0 was (Author: mlowicki): Did found these session IDs on other nodes: * https://www.dropbox.com/s/qtx5rzmqzl9zj47/Screenshot%202015-11-24%2022.22.03.png?dl=0 * https://www.dropbox.com/s/o7k0cfhscd1au50/Screenshot%202015-11-24%2022.22.19.png?dl=0 > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025516#comment-15025516 ] mlowicki commented on CASSANDRA-9935: - Nothing found. Checked system.log.1.zip from /var/log/cassandra on each box but only on db8.lati (where repair started) found those session IDs. > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > [na:1.7.0_80] > at >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025432#comment-15025432 ] mlowicki commented on CASSANDRA-9935: - Did found these session IDs on other nodes. > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > [na:1.7.0_80] > at > org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) > ~[apache-cassandra-2.1.8.jar:2.1.8] > at >
[jira] [Created] (CASSANDRA-10744) Option to monitor pending compaction tasks per type
mlowicki created CASSANDRA-10744: Summary: Option to monitor pending compaction tasks per type Key: CASSANDRA-10744 URL: https://issues.apache.org/jira/browse/CASSANDRA-10744 Project: Cassandra Issue Type: Wish Reporter: mlowicki Attachments: compaction_monitoring.png There is {{org.apache.cassandra.metrics:type=ColumnFamily,name=PendingCompactions}} which can help visualise number of pending compaction tasks (see attached screenshot). Unfortunately there is not way to distinguish what kind of tasks sit in this queue like how many SCRUB, COMPACTION, VALIDATION, CLEANUP or INDEX_BUILD tasks are there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10697) Leak detected while running offline scrub
mlowicki created CASSANDRA-10697: Summary: Leak detected while running offline scrub Key: CASSANDRA-10697 URL: https://issues.apache.org/jira/browse/CASSANDRA-10697 Project: Cassandra Issue Type: Bug Environment: C* 2.1.9 on Debian Wheezy Reporter: mlowicki Priority: Critical I got couple of those: {code} ERROR 05:09:15 LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3b60e162) to class org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1433208674:/var/lib/cassandra/data/sync/entity2-e24b5040199b11e5a30f75bb514ae072/sync-entity2-ka-405434 was not released before the reference was garbage collected {code} and then: {code} Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:99) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:81) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:353) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:444) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:424) at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:378) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:348) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:327) at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397) at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:75) at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:120) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:645) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(ColumnIndex.java:165) at org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:121) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:192) at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:127) at org.apache.cassandra.io.sstable.SSTableRewriter.tryAppend(SSTableRewriter.java:158) at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:220) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:116) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10689) java.lang.OutOfMemoryError: Direct buffer memory
mlowicki created CASSANDRA-10689: Summary: java.lang.OutOfMemoryError: Direct buffer memory Key: CASSANDRA-10689 URL: https://issues.apache.org/jira/browse/CASSANDRA-10689 Project: Cassandra Issue Type: Bug Reporter: mlowicki Fix For: 2.1.11 {code} ERROR [SharedPool-Worker-63] 2015-11-11 17:53:16,161 JVMStabilityInspector.java:117 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_80] at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) ~[na:1.7.0_80] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_80] at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) ~[na:1.7.0_80] at sun.nio.ch.IOUtil.read(IOUtil.java:195) ~[na:1.7.0_80] at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:149) ~[na:1.7.0_80] at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:104) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:81) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:310) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:64) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1894) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.columniterator.IndexedSliceReader.(IndexedSliceReader.java:83) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:42) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1994) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1837) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.11.jar:2.1.11] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10689) java.lang.OutOfMemoryError: Direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000901#comment-15000901 ] mlowicki commented on CASSANDRA-10689: -- After upgrade from 2.1.9 to 2.1.11 two days ago I'm getting lots of: {code} WARN [SharedPool-Worker-28] 2015-11-11 19:01:22,409 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-28,5,main]: {} org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data2/sync/entity2-e24b5040199b11e5a30f75bb514ae072/sync-entity2-ka-392603-Data.db): corruption detected, chunk at 11612338 of length 156219476. at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:85) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:310) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:64) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1894) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.columniterator.IndexedSliceReader.(IndexedSliceReader.java:83) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:42) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1994) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1837) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.11.jar:2.1.11] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data2/sync/entity2-e24b5040199b11e5a30f75bb514ae072/sync-entity2-ka-392603-Data.db): corruption detected, chunk at 11612338 of length 156219476. at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:116) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:81) ~[apache-cassandra-2.1.11.jar:2.1.11] ... 21 common frames omitted Caused by: java.io.IOException: Compressed lengths mismatch at org.apache.cassandra.io.compress.LZ4Compressor.uncompress(LZ4Compressor.java:98) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:112) ~[apache-cassandra-2.1.11.jar:2.1.11] ... 22 common frames omitted {code} On 3 out of 7 nodes in one data center. > java.lang.OutOfMemoryError: Direct buffer memory > > > Key: CASSANDRA-10689 > URL:
[jira] [Updated] (CASSANDRA-10689) java.lang.OutOfMemoryError: Direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-10689: - Reproduced In: 2.1.11 Fix Version/s: (was: 2.1.11) > java.lang.OutOfMemoryError: Direct buffer memory > > > Key: CASSANDRA-10689 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10689 > Project: Cassandra > Issue Type: Bug >Reporter: mlowicki > > {code} > ERROR [SharedPool-Worker-63] 2015-11-11 17:53:16,161 > JVMStabilityInspector.java:117 - JVM state determined to be unstable. > Exiting forcefully due to: > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_80] > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > ~[na:1.7.0_80] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > ~[na:1.7.0_80] > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) > ~[na:1.7.0_80] > at sun.nio.ch.IOUtil.read(IOUtil.java:195) ~[na:1.7.0_80] > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:149) > ~[na:1.7.0_80] > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:104) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:81) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:310) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:64) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1894) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.(IndexedSliceReader.java:83) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:42) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1994) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1837) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-2.1.11.jar:2.1.11] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10689) java.lang.OutOfMemoryError: Direct buffer memory
[ https://issues.apache.org/jira/browse/CASSANDRA-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001037#comment-15001037 ] mlowicki commented on CASSANDRA-10689: -- Running {{scrub}} on nodes with corrupted blocks gives: {code} root@db7:~# time nodetool scrub sync entity2 error: null -- StackTrace -- java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:214) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1022) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:292) at com.sun.proxy.$Proxy7.scrub(Unknown Source) at org.apache.cassandra.tools.NodeProbe.scrub(NodeProbe.java:247) at org.apache.cassandra.tools.NodeProbe.scrub(NodeProbe.java:266) at org.apache.cassandra.tools.NodeTool$Scrub.execute(NodeTool.java:1277) at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:289) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:203) real11m38.347s user0m2.356s sys 0m0.168s {code} > java.lang.OutOfMemoryError: Direct buffer memory > > > Key: CASSANDRA-10689 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10689 > Project: Cassandra > Issue Type: Bug >Reporter: mlowicki > > {code} > ERROR [SharedPool-Worker-63] 2015-11-11 17:53:16,161 > JVMStabilityInspector.java:117 - JVM state determined to be unstable. > Exiting forcefully due to: > java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_80] > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > ~[na:1.7.0_80] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) > ~[na:1.7.0_80] > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174) > ~[na:1.7.0_80] > at sun.nio.ch.IOUtil.read(IOUtil.java:195) ~[na:1.7.0_80] > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:149) > ~[na:1.7.0_80] > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:104) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:81) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:310) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:64) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1894) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.(IndexedSliceReader.java:83) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:42) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1994) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1837) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
[jira] [Updated] (CASSANDRA-10676) AssertionError in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-10676: - Fix Version/s: 2.1.9 > AssertionError in CompactionExecutor > > > Key: CASSANDRA-10676 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10676 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.9 >Reporter: mlowicki > Fix For: 2.1.9 > > > {code} > ERROR [CompactionExecutor:33329] 2015-11-09 08:16:22,759 > CassandraDaemon.java:223 - Exception in thread > Thread[CompactionExecutor:33329,1,main] > java.lang.AssertionError: > /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-888705-Data.db > at > org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:279) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > ^C > root@db1:~# tail -f /var/log/cassandra/system.log > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10676) AssertionError in CompactionExecutor
mlowicki created CASSANDRA-10676: Summary: AssertionError in CompactionExecutor Key: CASSANDRA-10676 URL: https://issues.apache.org/jira/browse/CASSANDRA-10676 Project: Cassandra Issue Type: Bug Environment: C* 2.1.9 Reporter: mlowicki {code} ERROR [CompactionExecutor:33329] 2015-11-09 08:16:22,759 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:33329,1,main] java.lang.AssertionError: /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-888705-Data.db at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:279) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] ^C root@db1:~# tail -f /var/log/cassandra/system.log at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10676) AssertionError in CompactionExecutor
[ https://issues.apache.org/jira/browse/CASSANDRA-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-10676: - Environment: C* 2.1.9 on Debian Wheezy (was: C* 2.1.9) > AssertionError in CompactionExecutor > > > Key: CASSANDRA-10676 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10676 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.9 on Debian Wheezy >Reporter: mlowicki > Fix For: 2.1.9 > > > {code} > ERROR [CompactionExecutor:33329] 2015-11-09 08:16:22,759 > CassandraDaemon.java:223 - Exception in thread > Thread[CompactionExecutor:33329,1,main] > java.lang.AssertionError: > /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-888705-Data.db > at > org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:279) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > ^C > root@db1:~# tail -f /var/log/cassandra/system.log > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:151) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8696) nodetool repair on cassandra 2.1.2 keyspaces return java.lang.RuntimeException: Could not create snapshot
[ https://issues.apache.org/jira/browse/CASSANDRA-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790743#comment-14790743 ] mlowicki commented on CASSANDRA-8696: - [~folex] [~yukim] looks like this is the same as CASSANDRA-9935. In My C* 2.1.8 cluster 100% reproducible. > nodetool repair on cassandra 2.1.2 keyspaces return > java.lang.RuntimeException: Could not create snapshot > - > > Key: CASSANDRA-8696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8696 > Project: Cassandra > Issue Type: Bug >Reporter: Jeff Liu >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: Logs.zip > > > When trying to run nodetool repair -pr on cassandra node ( 2.1.2), cassandra > throw java exceptions: cannot create snapshot. > the error log from system.log: > {noformat} > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:28,815 > StreamResultFuture.java:166 - [Stream #692c1450-a692-11e4-9973-070e938df227 > ID#0] Prepare completed. Receiving 2 files(221187 bytes), sending 5 > files(632105 bytes) > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 > StreamResultFuture.java:180 - [Stream #692c1450-a692-11e4-9973-070e938df227] > Session with /10.97.9.110 is complete > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 > StreamResultFuture.java:212 - [Stream #692c1450-a692-11e4-9973-070e938df227] > All sessions completed > INFO [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,047 > StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] > streaming task succeed, returning response to /10.98.194.68 > INFO [RepairJobTask:1] 2015-01-28 02:07:29,065 StreamResultFuture.java:86 - > [Stream #692c6270-a692-11e4-9973-070e938df227] Executing streaming plan for > Repair > INFO [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,065 > StreamSession.java:213 - [Stream #692c6270-a692-11e4-9973-070e938df227] > Starting streaming to /10.66.187.201 > INFO [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,070 > StreamCoordinator.java:209 - [Stream #692c6270-a692-11e4-9973-070e938df227, > ID#0] Beginning stream session with /10.66.187.201 > INFO [STREAM-IN-/10.66.187.201] 2015-01-28 02:07:29,465 > StreamResultFuture.java:166 - [Stream #692c6270-a692-11e4-9973-070e938df227 > ID#0] Prepare completed. Receiving 5 files(627994 bytes), sending 5 > files(632105 bytes) > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,971 > StreamResultFuture.java:180 - [Stream #692c6270-a692-11e4-9973-070e938df227] > Session with /10.66.187.201 is complete > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,972 > StreamResultFuture.java:212 - [Stream #692c6270-a692-11e4-9973-070e938df227] > All sessions completed > INFO [StreamReceiveTask:22] 2015-01-28 02:07:31,972 > StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] > streaming task succeed, returning response to /10.98.194.68 > ERROR [RepairJobTask:1] 2015-01-28 02:07:39,444 RepairJob.java:127 - Error > occurred during snapshot phase > java.lang.RuntimeException: Could not create snapshot at /10.97.9.110 > at > org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_45] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > INFO [AntiEntropySessions:6] 2015-01-28 02:07:39,445 RepairSession.java:260 > - [repair #6f85e740-a692-11e4-9973-070e938df227] new session: will sync > /10.98.194.68, /10.66.187.201, /10.226.218.135 on range > (12817179804668051873746972069086 > 2638799,12863540308359254031520865977436165] for events.[bigint0text, > bigint0boolean, bigint0int, dataset_catalog, column_categories, > bigint0double, bigint0bigint] > ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,445 RepairSession.java:303 > - [repair #685e3d00-a692-11e4-9973-070e938df227] session completed with the > following error > java.io.IOException: Failed during snapshot creation. > at > org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731416#comment-14731416 ] mlowicki commented on CASSANDRA-9935: - [~yukim] I've launched repair for all keyspaces {{nodetool repair --in-local-dc --parallel}}. #1 was for "OpsCenter", #2 for sync which is mentioned above in this thread, #3 for system_traces. Part of the output in https://cpaste.org/plvyleda5. Interesting it says: {code} [2015-09-04 18:07:55,588] Repair command #2 finished {code} Maybe the problem with assertion error is while outputting results as repair for sync keyspace always fails after similar time period? > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]]
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731811#comment-14731811 ] mlowicki commented on CASSANDRA-9935: - [~yukim] how can I detect that repair succeeded? We've restarted all nodes couple of days ago so it didn't helped. > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > [na:1.7.0_80] > at > org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) > ~[apache-cassandra-2.1.8.jar:2.1.8] >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681935#comment-14681935 ] mlowicki commented on CASSANDRA-9935: - [~yukim] any updates? Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.8.jar:2.1.8] at
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659666#comment-14659666 ] mlowicki commented on CASSANDRA-9935: - [~yukim] I've launched repair in 2nd DC to get more logs from repair - https://gist.github.com/mlowicki/43e3074f46f12737577e. I've found two exceptions: {code} [2015-08-06 03:03:33,231] Repair session d4f0d420-3baa-11e5-9ec3-75bb514ae072 for range (-144620433819156,-1424504876804571443] failed with error org.apache.cassandra.exceptions.RepairException: [repair #d4f0d420-3baa-11e5-9ec3-75bb514ae072 on sync/entity2, (-144620433819156,-1424504876804571443]] Validation failed in /10.210.3.162 {code} and {code} [2015-08-06 03:03:33,239] Repair session 967ca730-3bb1-11e5-9ec3-75bb514ae072 for range (3125697280560263437,3131751716701120659] failed with error org.apache.cassandra.exceptions.RepairException: [repair #967ca730-3bb1-11e5-9ec3-75bb514ae072 on sync/entity_by_id2, (3125697280560263437,3131751716701120659]] Validation failed in /10.210.3.221 {code} 10.210.3.162 = db6.sync.ams.osa 10.210.3.221 = db1.sync.ams.osa Repair was started on db1.sync.ams.osa. I see no errors on db6.sync.ams.osa in system.log starting from 2015-08-06 00:24:16,322 to 2015-08-06 08:04:58,283 (no ERROR string there). On db1.sync.ams.osa I've found two errors - https://gist.github.com/mlowicki/3bf39f9f9ad0d4e202e5. I've launched {{nodetool scrub}} on db6.sync.ams.osa and will send logs when finish. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660075#comment-14660075 ] mlowicki commented on CASSANDRA-9935: - Logs from db6.sync.ams.osa where scrub was started - https://drive.google.com/file/d/0B_8mc_afWmd2NjZXZGJRRnI4TzA/view?usp=sharing Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654970#comment-14654970 ] mlowicki commented on CASSANDRA-9935: - [~yukim] the same error after ~12 hours: {code} [2015-08-05 06:35:07,340] Repair session 18f8c020-3b3c-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished[2015-08-05 06:35:07,340] Repair command #6 finished error: nodetool failed, check server logs-- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logsat org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} Logs from db1.sync.lati.osa (10.195.15.162) - https://drive.google.com/file/d/0B_8mc_afWmd2LWcxRWRPWTFnMlk/view?usp=sharing Logs from db4.sync.lati.osa (10.195.15.167) - https://drive.google.com/file/d/0B_8mc_afWmd2ejVnR24tVm5OZUk/view?usp=sharing Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
[jira] [Commented] (CASSANDRA-9702) Repair running really slow
[ https://issues.apache.org/jira/browse/CASSANDRA-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654981#comment-14654981 ] mlowicki commented on CASSANDRA-9702: - After upgrade to 2.1.8 we're seeing CASSANDRA-9935 instead. Repair running really slow -- Key: CASSANDRA-9702 URL: https://issues.apache.org/jira/browse/CASSANDRA-9702 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Fix For: 2.1.x Attachments: db1.system.log We're using 2.1.x since the very beginning and we always had problem with failing or slow repair. In one data center we aren't able to finish repair for many weeks (partially because CASSANDRA-9681 as we needed to reboot nodes periodically). I've launched it today morning (12 hours now) and monitor using https://github.com/spotify/cassandra-opstools/blob/master/bin/spcassandra-repairstats. For the first hour it progressed to 9.43% but then it took ~10 hours to reach 9.44%. I see very rarely logs related to repair (each 15-20 minutes but sometimes nothing new for 1 hour). Repair launched with: {code} nodetool repair --partitioner-range --parallel --in-local-dc {keyspace} {code} Attached log file from today. We've ~4.1TB of data in 12 nodes with RF set to 3 (2 DC with 6 nodes each). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658878#comment-14658878 ] mlowicki commented on CASSANDRA-9935: - It didn't print anything to the console on all nodes. I can grep through system.log or attach logs from each box if this helps? Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654169#comment-14654169 ] mlowicki commented on CASSANDRA-9935: - Just finished running {{nodetool scrub}} on all nodes in single DC (took ~12 hours) and started repair. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652680#comment-14652680 ] mlowicki commented on CASSANDRA-9935: - Yes, I'm using LCS. I'll run scrub on these nodes and then repair. Will let you know about the result. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652638#comment-14652638 ] mlowicki commented on CASSANDRA-9935: - [~yukim] ping. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.8.jar:2.1.8] at
[jira] [Commented] (CASSANDRA-8821) Errors in JVM_OPTS and cassandra_parms environment vars
[ https://issues.apache.org/jira/browse/CASSANDRA-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651743#comment-14651743 ] mlowicki commented on CASSANDRA-8821: - Because of this bug f.ex. {{cassandra service status}} doesn't work. Errors in JVM_OPTS and cassandra_parms environment vars --- Key: CASSANDRA-8821 URL: https://issues.apache.org/jira/browse/CASSANDRA-8821 Project: Cassandra Issue Type: Bug Environment: Ubuntu 14.04 LTS amd64 Reporter: Terry Moschou Assignee: Michael Shuler Priority: Minor Fix For: 2.1.x, 2.2.x Attachments: 8821_2.0.txt, 8821_2.1.txt Repos: deb http://www.apache.org/dist/cassandra/debian 21x main deb-src http://www.apache.org/dist/cassandra/debian 21x main The cassandra init script /etc/init.d/cassandra is sourcing the environment file /etc/cassandra/cassandra-env.sh twice. Once directly from the init script, and again inside /usr/sbin/cassandra The result is arguments in JVM_OPTS are duplicated. Further the JVM opt -XX:CMSWaitDuration=1 is defined twice if jvm = 1.7.60. Also, for the environment variable CASSANDRA_CONF used in this context -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler is undefined when /etc/cassandra/cassandra-env.sh is sourced from the init script. Lastly the variable cassandra_storagedir is undefined in /usr/sbin/cassandra when used in this context -Dcassandra.storagedir=$cassandra_storagedir -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649575#comment-14649575 ] mlowicki commented on CASSANDRA-9935: - Failed with the same error after ~13 hours: {code} [2015-07-31 16:57:43,909] Repair command #5 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} Log file - https://drive.google.com/file/d/0B_8mc_afWmd2OV96RDZBclRNSFE/view?usp=sharing. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at
[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649575#comment-14649575 ] mlowicki edited comment on CASSANDRA-9935 at 7/31/15 6:19 PM: -- Failed with the same error after ~13 hours: {code} [2015-07-31 16:57:43,909] Repair command #5 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} Log file - https://drive.google.com/file/d/0B_8mc_afWmd2OV96RDZBclRNSFE/view?usp=sharing. Tried yesterday to run repair in another DC but got the same error. was (Author: mlowicki): Failed with the same error after ~13 hours: {code} [2015-07-31 16:57:43,909] Repair command #5 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} Log file - https://drive.google.com/file/d/0B_8mc_afWmd2OV96RDZBclRNSFE/view?usp=sharing. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 -
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648760#comment-14648760 ] mlowicki commented on CASSANDRA-9935: - {{nodetool scrub sync}} finished on db1.sync.lati.osa and db5.sync.lati.osa. Just launched repair but it can take up to 10-12 hours before it crashes. Will keep you updated. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at
[jira] [Created] (CASSANDRA-9935) Repair fails with RuntimeException
mlowicki created CASSANDRA-9935: --- Summary: Repair fails with RuntimeException Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Attachments: db1.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.8.jar:2.1.8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2,
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647925#comment-14647925 ] mlowicki commented on CASSANDRA-9935: - ping db1.sync.lati.osa PING a10-05-07.lati.osa (10.195.15.162): 56 data bytes So you've log attached to this ticket. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648101#comment-14648101 ] mlowicki commented on CASSANDRA-9935: - Should I run {{nodetool scrub sync}} on db1.sync.lati.osa and db5.sync.lati.osa or on all nodes inside this data center? Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at
[jira] [Updated] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9935: Attachment: db5.sync.lati.osa.cassandra.log Attached log from 10.195.15.176 (db5.sync.lati.osa). Older ones available on https://drive.google.com/file/d/0B_8mc_afWmd2Vnk4ZE5kS3J6OE0/view?usp=sharing and https://drive.google.com/file/d/0B_8mc_afWmd2UElxUEZQUmtsaFk/view?usp=sharing (They are bigger than 10MB). Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647981#comment-14647981 ] mlowicki commented on CASSANDRA-9935: - More logs from db1.sync.lati.osa (10.195.15.162) available on https://drive.google.com/file/d/0B_8mc_afWmd2QVk2VVRTRVl1ZDQ/view?usp=sharing and https://drive.google.com/file/d/0B_8mc_afWmd2MHREM2hzUlNjd0E/view?usp=sharing. Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log, db5.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950)
[jira] [Commented] (CASSANDRA-9702) Repair running really slow
[ https://issues.apache.org/jira/browse/CASSANDRA-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611607#comment-14611607 ] mlowicki commented on CASSANDRA-9702: - After another ~12 hours it progressed to 10.21%. Repair running really slow -- Key: CASSANDRA-9702 URL: https://issues.apache.org/jira/browse/CASSANDRA-9702 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Attachments: db1.system.log We're using 2.1.x since the very beginning and we always had problem with failing or slow repair. In one data center we aren't able to finish repair for many weeks (partially because CASSANDRA-9681 as we needed to reboot nodes periodically). I've launched it today morning (12 hours now) and monitor using https://github.com/spotify/cassandra-opstools/blob/master/bin/spcassandra-repairstats. For the first hour it progressed to 9.43% but then it took ~10 hours to reach 9.44%. I see very rarely logs related to repair (each 15-20 minutes but sometimes nothing new for 1 hour). Repair launched with: {code} nodetool repair --partitioner-range --parallel --in-local-dc {keyspace} {code} Attached log file from today. We've ~4.1TB of data in 12 nodes with RF set to 3 (2 DC with 6 nodes each). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9702) Repair running really slow
[ https://issues.apache.org/jira/browse/CASSANDRA-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611607#comment-14611607 ] mlowicki edited comment on CASSANDRA-9702 at 7/2/15 1:55 PM: - After another ~12 hours it progressed to 10.21%. 6 hours later it's 10.52%. was (Author: mlowicki): After another ~12 hours it progressed to 10.21%. Repair running really slow -- Key: CASSANDRA-9702 URL: https://issues.apache.org/jira/browse/CASSANDRA-9702 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Fix For: 2.1.x Attachments: db1.system.log We're using 2.1.x since the very beginning and we always had problem with failing or slow repair. In one data center we aren't able to finish repair for many weeks (partially because CASSANDRA-9681 as we needed to reboot nodes periodically). I've launched it today morning (12 hours now) and monitor using https://github.com/spotify/cassandra-opstools/blob/master/bin/spcassandra-repairstats. For the first hour it progressed to 9.43% but then it took ~10 hours to reach 9.44%. I see very rarely logs related to repair (each 15-20 minutes but sometimes nothing new for 1 hour). Repair launched with: {code} nodetool repair --partitioner-range --parallel --in-local-dc {keyspace} {code} Attached log file from today. We've ~4.1TB of data in 12 nodes with RF set to 3 (2 DC with 6 nodes each). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609848#comment-14609848 ] mlowicki commented on CASSANDRA-9681: - After couple of hours it's still fine - https://www.dropbox.com/s/ox5xzxqbojyv7wz/Screenshot%202015-07-01%2011.49.53.png?dl=0. It always started to grow right after restart so we can assume that this problem is fixed. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9702) Repair running really slow
mlowicki created CASSANDRA-9702: --- Summary: Repair running really slow Key: CASSANDRA-9702 URL: https://issues.apache.org/jira/browse/CASSANDRA-9702 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Attachments: db1.system.log We're using 2.1.x since the very beginning and we always had problem with failing or slow repair. In one data center we aren't able to finish repair for many weeks (partially because CASSANDRA-9681 as we needed to reboot nodes periodically). I've launched it today morning (12 hours now) and monitor using https://github.com/spotify/cassandra-opstools/blob/master/bin/spcassandra-repairstats. For the first hour it progressed to 9.43% but then it took ~10 hours to reach 9.44%. I see very rarely logs related to repair (each 15-20 minutes but sometimes nothing new for 1 hour). Repair launched with: {code} nodetool repair --partitioner-range --parallel --in-local-dc {keyspace} {code} Attached log file from today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9702) Repair running really slow
[ https://issues.apache.org/jira/browse/CASSANDRA-9702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9702: Description: We're using 2.1.x since the very beginning and we always had problem with failing or slow repair. In one data center we aren't able to finish repair for many weeks (partially because CASSANDRA-9681 as we needed to reboot nodes periodically). I've launched it today morning (12 hours now) and monitor using https://github.com/spotify/cassandra-opstools/blob/master/bin/spcassandra-repairstats. For the first hour it progressed to 9.43% but then it took ~10 hours to reach 9.44%. I see very rarely logs related to repair (each 15-20 minutes but sometimes nothing new for 1 hour). Repair launched with: {code} nodetool repair --partitioner-range --parallel --in-local-dc {keyspace} {code} Attached log file from today. We've ~4.1TB of data in 12 nodes with RF set to 3 (2 DC with 6 nodes each). was: We're using 2.1.x since the very beginning and we always had problem with failing or slow repair. In one data center we aren't able to finish repair for many weeks (partially because CASSANDRA-9681 as we needed to reboot nodes periodically). I've launched it today morning (12 hours now) and monitor using https://github.com/spotify/cassandra-opstools/blob/master/bin/spcassandra-repairstats. For the first hour it progressed to 9.43% but then it took ~10 hours to reach 9.44%. I see very rarely logs related to repair (each 15-20 minutes but sometimes nothing new for 1 hour). Repair launched with: {code} nodetool repair --partitioner-range --parallel --in-local-dc {keyspace} {code} Attached log file from today. Repair running really slow -- Key: CASSANDRA-9702 URL: https://issues.apache.org/jira/browse/CASSANDRA-9702 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Attachments: db1.system.log We're using 2.1.x since the very beginning and we always had problem with failing or slow repair. In one data center we aren't able to finish repair for many weeks (partially because CASSANDRA-9681 as we needed to reboot nodes periodically). I've launched it today morning (12 hours now) and monitor using https://github.com/spotify/cassandra-opstools/blob/master/bin/spcassandra-repairstats. For the first hour it progressed to 9.43% but then it took ~10 hours to reach 9.44%. I see very rarely logs related to repair (each 15-20 minutes but sometimes nothing new for 1 hour). Repair launched with: {code} nodetool repair --partitioner-range --parallel --in-local-dc {keyspace} {code} Attached log file from today. We've ~4.1TB of data in 12 nodes with RF set to 3 (2 DC with 6 nodes each). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Comment: was deleted (was: Great. I'm not Java guy so what is the best way to patch jar file I've installed from DataStax repo?) Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609685#comment-14609685 ] mlowicki edited comment on CASSANDRA-9681 at 7/1/15 7:34 AM: - So far so good - https://www.dropbox.com/s/ad8te1g6iz2wofe/Screenshot%202015-07-01%2009.31.00.png?dl=0. I'll let you know if it'll degrade or not. GC pauses we've talked about yesterday are probably caused by misbehaving Logstash or Kibana as I've checked using jstat and gc.log that everything is fine on this boxes. All nodes in the cluster have been patched ~7am. was (Author: mlowicki): So far so good - https://www.dropbox.com/s/ad8te1g6iz2wofe/Screenshot%202015-07-01%2009.31.00.png?dl=0. I'll let you know if it'll degrade or not. GC pauses we've talked about yesterday are probably caused by misbehaving Logstash or Kibana as I've checked using jstat and gc.log that everything is fine on this boxes. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609685#comment-14609685 ] mlowicki commented on CASSANDRA-9681: - So far so good - https://www.dropbox.com/s/ad8te1g6iz2wofe/Screenshot%202015-07-01%2009.31.00.png?dl=0. I'll let you know if it'll degrade or not. GC pauses we've talked about yesterday are probably caused by misbehaving Logstash or Kibana as I've checked using jstat and gc.log that everything is fine on this boxes. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607913#comment-14607913 ] mlowicki commented on CASSANDRA-9681: - https://www.dropbox.com/s/cnv36bbdznbwc0g/Screenshot%202015-06-30%2010.07.27.png?dl=0 - this if chart from the box I was creating heap dump. Please keep in mind that the metric changes rapidly. It can grow from ~300MB to over 1GB within 3 minutes. I'll prepare heap dump today once again. I'm using jmap: {code} root@db5:/var# jmap -F -dump:file=cassandra.bin 19189 {code} and this C* node is dead for the rest of the cluster for ~40minutes. Can this be avoided? Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607942#comment-14607942 ] mlowicki edited comment on CASSANDRA-9681 at 6/30/15 8:31 AM: -- Attaching logs from db5: {code} -rw-r--r-- 1 cassandra cassandra 3.3M Jun 30 07:58 system.log -rw-r--r-- 1 cassandra cassandra 854K Jun 29 14:19 system.log.1.zip -rw-r--r-- 1 cassandra cassandra 1.3M Jun 27 22:31 system.log.2.zip -rw-r--r-- 1 cassandra cassandra 1.8M Jun 24 11:43 system.log.3.zip {code} Memtable heap size on this boxes behaves like on the chart - https://www.dropbox.com/s/l9cgch2hlguco85/Screenshot%202015-06-30%2010.30.59.png?dl=0 was (Author: mlowicki): Attaching logs from db5: {code} -rw-r--r-- 1 cassandra cassandra 3.3M Jun 30 07:58 system.log -rw-r--r-- 1 cassandra cassandra 854K Jun 29 14:19 system.log.1.zip -rw-r--r-- 1 cassandra cassandra 1.3M Jun 27 22:31 system.log.2.zip -rw-r--r-- 1 cassandra cassandra 1.8M Jun 24 11:43 system.log.3.zip {code} Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607934#comment-14607934 ] mlowicki commented on CASSANDRA-9681: - Without -F it gives: {code} root@db5:/var# jmap -dump:file=cassandra.bin 19189 19189: Unable to open socket file: target process not responding or HotSpot VM not loaded The -F option can be used when the target process is not responding {code} I've started dumping heap when metrics shows 1.7GB. Will attach soon. Logs will be available in a few. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607913#comment-14607913 ] mlowicki edited comment on CASSANDRA-9681 at 6/30/15 8:12 AM: -- https://www.dropbox.com/s/cnv36bbdznbwc0g/Screenshot%202015-06-30%2010.07.27.png?dl=0 - this if chart from the box I was creating heap dump. Please keep in mind that the metric changes rapidly. It can grow from ~300MB to over 1GB within 3 minutes. I'll prepare heap dump today once again. I'm using jmap: {code} root@db5:/var# jmap -F -dump:file=cassandra.bin 19189 {code} and this C* node is dead for the rest of the cluster for ~40minutes (https://gist.github.com/mlowicki/7645963e2a1ac4563578). Can this be avoided? was (Author: mlowicki): https://www.dropbox.com/s/cnv36bbdznbwc0g/Screenshot%202015-06-30%2010.07.27.png?dl=0 - this if chart from the box I was creating heap dump. Please keep in mind that the metric changes rapidly. It can grow from ~300MB to over 1GB within 3 minutes. I'll prepare heap dump today once again. I'm using jmap: {code} root@db5:/var# jmap -F -dump:file=cassandra.bin 19189 {code} and this C* node is dead for the rest of the cluster for ~40minutes. Can this be avoided? Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Attachment: db5.system.log.3.zip db5.system.log.2.zip db5.system.log.1.zip db5.system.log Attaching logs from db5: {code} -rw-r--r-- 1 cassandra cassandra 3.3M Jun 30 07:58 system.log -rw-r--r-- 1 cassandra cassandra 854K Jun 29 14:19 system.log.1.zip -rw-r--r-- 1 cassandra cassandra 1.3M Jun 27 22:31 system.log.2.zip -rw-r--r-- 1 cassandra cassandra 1.8M Jun 24 11:43 system.log.3.zip {code} Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608231#comment-14608231 ] mlowicki commented on CASSANDRA-9681: - Cool. If more logs / dumps / cheers needed just let me know. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608278#comment-14608278 ] mlowicki commented on CASSANDRA-9681: - Sure, just let me know and we'll try to apply the patch. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608082#comment-14608082 ] mlowicki commented on CASSANDRA-9681: - Heap dump - https://drive.google.com/file/d/0B_8mc_afWmd2bGhpd0p2Ql9UMkU/view?usp=sharing. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608449#comment-14608449 ] mlowicki commented on CASSANDRA-9681: - Great. I'm not Java guy so what is the best way to patch jar file I've installed from DataStax repo? Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Attachment: schema.cql Attaching our schema. We're using LCS and we aren't using secondary indexes. Heap dump is uploading to Google Drive so should be available soon. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, db5.system.log, db5.system.log.1.zip, db5.system.log.2.zip, db5.system.log.3.zip, schema.cql, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
mlowicki created CASSANDRA-9681: --- Summary: Memtable heap size grows and many long GC pauses are triggered Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Priority: Critical C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Description: C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. was: C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Priority: Critical C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Description: C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. was: C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Priority: Critical C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Comment: was deleted (was: I'll get heap dump probably tomorrow then as nodes have been restarted ~2 hours ago.) Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Attachment: cassandra.yaml Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Attachment: system.log.9.zip system.log.8.zip system.log.7.zip system.log.6.zip {code} -rw-r--r-- 1 cassandra cassandra 1.3M Jun 12 14:32 system.log.6.zip -rw-r--r-- 1 cassandra cassandra 1.9M Jun 10 13:11 system.log.7.zip -rw-r--r-- 1 cassandra cassandra 1.9M Jun 6 21:55 system.log.8.zip -rw-r--r-- 1 cassandra cassandra 1.9M Jun 4 01:29 system.log.9.zip {code} Logs from the time when it basically started. If more needed just let me know. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mlowicki updated CASSANDRA-9681: Description: C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. was: C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606057#comment-14606057 ] mlowicki commented on CASSANDRA-9681: - It started ~ 04.06 (at about the same time on all affected boxes - https://www.dropbox.com/s/9c6p2xdmncktbnu/Screenshot%202015-06-29%2020.16.02.png?dl=0, https://www.dropbox.com/s/gs8bztzr394icz0/Screenshot%202015-06-29%2020.16.24.png?dl=0). Will attach logs soon. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606084#comment-14606084 ] mlowicki commented on CASSANDRA-9681: - I'll get heap dump probably tomorrow then as nodes have been restarted ~2 hours ago. Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9681) Memtable heap size grows and many long GC pauses are triggered
[ https://issues.apache.org/jira/browse/CASSANDRA-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606399#comment-14606399 ] mlowicki commented on CASSANDRA-9681: - https://www.dropbox.com/s/nhgudkyxwjdrq0f/cassandra.bin?dl=0 This dump has been created when memtables heap size was ~800MB (on not affected boxes it's 500MB). Memtable heap size grows and many long GC pauses are triggered -- Key: CASSANDRA-9681 URL: https://issues.apache.org/jira/browse/CASSANDRA-9681 Project: Cassandra Issue Type: Bug Environment: C* 2.1.7, Debian Wheezy Reporter: mlowicki Assignee: Benedict Priority: Critical Fix For: 2.1.x Attachments: cassandra.yaml, system.log.6.zip, system.log.7.zip, system.log.8.zip, system.log.9.zip C* 2.1.7 cluster is behaving really bad after 1-2 days. {{gauges.cassandra.jmx.org.apache.cassandra.metrics.ColumnFamily.AllMemtablesHeapSize.Value}} jumps to 7 GB (https://www.dropbox.com/s/vraggy292erkzd2/Screenshot%202015-06-29%2019.12.53.png?dl=0) on 3/6 nodes in each data center and then there are many long GC pauses. Cluster is using default heap size values ({{-Xms8192M -Xmx8192M -Xmn2048M}}) Before C* 2.1.5 memtables heap size was basically constant ~500MB (https://www.dropbox.com/s/fjdywik5lojstvn/Screenshot%202015-06-29%2019.30.00.png?dl=0) After restarting all nodes is behaves stable for 1-2days. Today I've done that and long GC pauses are gone (~18:00 https://www.dropbox.com/s/7vo3ynz505rsfq3/Screenshot%202015-06-29%2019.28.37.png?dl=0). The only pattern we've found so far is that long GC pauses are happening basically at the same time on all nodes in the same data center - even on the ones where memtables heap size is not growing. Cliffs on the graphs are nodes restarts. Used memory on boxes where {{AllMemtabelesHeapSize}} grows, stays at the same level - https://www.dropbox.com/s/tes9abykixs86rf/Screenshot%202015-06-29%2019.37.52.png?dl=0. Replication factor is set to 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9612) Assertion error while running `nodetool cfstats`
[ https://issues.apache.org/jira/browse/CASSANDRA-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592309#comment-14592309 ] mlowicki commented on CASSANDRA-9612: - [~mambocab] yes. Assertion error while running `nodetool cfstats` Key: CASSANDRA-9612 URL: https://issues.apache.org/jira/browse/CASSANDRA-9612 Project: Cassandra Issue Type: Bug Environment: C* 2.1.6 Reporter: mlowicki nodetool cfstats sync.entity {code} Keyspace: sync Read Count: 2916573 Read Latency: 0.26340278573517617 ms. Write Count: 2356495 Write Latency: 0.03296340242606074 ms. Pending Flushes: 0 Table: entity SSTable count: 919 SSTables in each level: [50/4, 11/10, 101/100, 756, 0, 0, 0, 0, 0] Space used (live): 146265014558 Space used (total): 146265014558 Space used by snapshots (total): 0 Off heap memory used (total): 97950899 SSTable Compression Ratio: 0.1870809135227128 error: /var/lib/cassandra/data2/sync/entity-f73d1360770e11e49f1d673dc3e50a5f/sync-entity-tmplink-ka-516810-Data.db -- StackTrace -- java.lang.AssertionError: /var/lib/cassandra/data2/sync/entity-f73d1360770e11e49f1d673dc3e50a5f/sync-entity-tmplink-ka-516810-Data.db at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:270) at org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:296) at org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:290) at com.yammer.metrics.reporting.JmxReporter$Gauge.getValue(JmxReporter.java:63) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1464) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:657) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$2.run(Transport.java:202) at sun.rmi.transport.Transport$2.run(Transport.java:199) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:198) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:567) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:828) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(TCPTransport.java:619) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:684) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(TCPTransport.java:681) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:681) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at