[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Brandon Williams (JIRA) Tue, 14 Jun 2011 11:39:53 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049326#comment-13049326
 ]


Brandon Williams commented on CASSANDRA-2768:
---------------------------------------------

bq. Since Sasha has reportedly verified that all node report being on 0.8.0, 
this suggests a Gossiper bug that reports the wrong version (even after node 
restarts).

Gossiper's setVersion and getVersion are fairly straightforward, and setVersion 
is called in IncomingTcpConnection and setting it to whatever the remote node 
said to, so a bug here looks unlikely.  The version information is not 
persisted anywhere, so the remote node has to be indicating it is 0.7.  I was 
unable to reproduce following Sasha's steps, so I think the most likely 
explanation here is that a node is mistakenly still on 0.7.

{quote}
I am seeing this error on two of the nodes:

ERROR [pool-2-thread-14] 2011-06-14 23:33:40,544 CustomTThreadPoolServer.java 
(line 199) Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in 
readMessageBegin, old client?
{quote}

This is indicative of a client-side thrift compatibility problem and is 
unrelated, as thrift is not used for internode communication.

> AntiEntropyService excluding nodes that are on version 0.7 or sooner
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-2768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: 4 node environment -- 
> Originally 0.7.6-2 with a Keyspace defined with RF=3
> Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node 
> was shut down, new version was turned on, using the existing data files / 
> directories and a nodetool repair was run.  
>            Reporter: Sasha Dolgy
>            Assignee: Brandon Williams
>
> When I run nodetool repair on any of the nodes, the 
> /var/log/cassandra/system.log reports errors similar to:
> INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 
> 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from 
> repair because it is on version 0.7 or sooner. You should consider updating 
> this node before running repair again.
> ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 
> 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in 
> thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI 
> Runtime]
> java.util.ConcurrentModificationException
>       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
>       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
>       at 
> org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
>       at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
> The INFO message and subsequent ERROR message are logged for 2 nodes .. I 
> suspect that this is because RF=3.  
> nodetool ring shows that all nodes are up.  
> Client connections (read / write) are not having issues..  
> nodetool version on all nodes shows that each node is 0.8.0
> At suggestion of some contributors, I have restarted each node and tried to 
> run a nodetool repair again ... the result is the same with the messages 
> being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

Reply via email to