[
https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834628#action_12834628
]
Ryan King commented on CASSANDRA-800:
-------------------------------------
I'm skeptical about it too, but I've seen stranger effects from OOME's. We've
made some config changes to (hopefully) reduce heap size pressure. I'll let you
know if that improves the situation or now.
> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
> Key: CASSANDRA-800
> URL: https://issues.apache.org/jira/browse/CASSANDRA-800
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.5, 0.6, 0.7
> Reporter: Ryan King
> Assignee: Jaakko Laine
> Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race
> condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java
> (line 484) Problem reading from socket connected to :
> java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000
> remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java
> (line 484) Problem reading from socket connected to :
> java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000
> remote=/10.209.23.80:36128]
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java
> (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread
> MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80
> BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading
> from: /10.209.23.80 BufferSizeRemaining: 16
> at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
> at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
> at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
> at
> org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096)
> Internal error processing batch_insert
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
> at java.util.HashMap$KeyIterator.next(HashMap.java:883)
> at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
> at java.util.HashSet.<init>(HashSet.java:100)
> at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
> at
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
> at
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
> at
> org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
> at
> org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
> at
> org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
> at
> org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
> at
> org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
> at
> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress
> /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress
> /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress
> /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress
> /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress
> /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress
> /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress
> /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress
> /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress
> /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress
> /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress
> /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress
> /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010
> 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress
> /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress
> /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress
> /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress
> /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress
> /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress
> /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress
> /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which
> claims to be:
> git-svn-id:
> https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-...@9035
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.