[
https://issues.apache.org/jira/browse/CASSANDRA-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394627#comment-14394627
]
Sergey Maznichenko commented on CASSANDRA-9092:
-----------------------------------------------
We have OpsCenter Agent. Such errors repeat 1-2 timer per hour during load of
data. In DC1 now we don't have any hints.
I guess that traffic can go to all nodes because client settings, I will check
it.
I had tried to perform 'nodetool repair' from the node in DC2 and after 30
hours delay, I got bunch of errors in console, like:
[2015-04-02 19:32:14,352] Repair session 6ff4f071-d94d-11e4-9257-f7b14a924a15
for range (-3563451573336693456,-3535530477916720868] failed with error
java.io.IOException: Cannot proceed on repair because a neighbor (/10.XX.XX.11)
is dead: session failed
but 'nodetool status' reports that all nodes are live and I can see successful
communication between nodes in their logs. It's strange...
> Nodes in DC2 die during and after huge write workload
> -----------------------------------------------------
>
> Key: CASSANDRA-9092
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9092
> Project: Cassandra
> Issue Type: Bug
> Environment: CentOS 6.2 64-bit, Cassandra 2.1.2,
> java version "1.7.0_71"
> Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
> Reporter: Sergey Maznichenko
> Assignee: Sam Tunnicliffe
> Fix For: 2.1.5
>
> Attachments: cassandra_crash1.txt
>
>
> Hello,
> We have Cassandra 2.1.2 with 8 nodes, 4 in DC1 and 4 in DC2.
> Node is VM 8 CPU, 32GB RAM
> During significant workload (loading several millions blobs ~3.5MB each), 1
> node in DC2 stops and after some time next 2 nodes in DC2 also stops.
> Now, 2 of nodes in DC2 do not work and stops after 5-10 minutes after start.
> I see many files in system.hints table and error appears in 2-3 minutes after
> starting system.hints auto compaction.
> Stops, means "ERROR [CompactionExecutor:1] 2015-04-01 23:33:44,456
> CassandraDaemon.java:153 - Exception in thread
> Thread[CompactionExecutor:1,1,main]
> java.lang.OutOfMemoryError: Java heap space"
> ERROR [HintedHandoff:1] 2015-04-01 23:33:44,456 CassandraDaemon.java:153 -
> Exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.OutOfMemoryError: Java heap space
> Full errors listing attached in cassandra_crash1.txt
> The problem exists only in DC2. We have 1GbE between DC1 and DC2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)