[jira] [Commented] (CASSANDRA-7734) Schema pushes (seemingly) randomly not happening

graham sanderson (JIRA) Sun, 10 Aug 2014 12:54:25 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092169#comment-14092169
 ]


graham sanderson commented on CASSANDRA-7734:
---------------------------------------------

Note this is not a problem _during_ the upgrade; it is a problem after the 
upgrade with all nodes successfully on 2.0.9

I'm a bit confused from a technical perspective, so would welcome any comments 
from others who have been near this code: [~iamaleksey], [~jbellis]

I'm not sure the lifecycle of IncomingTcpConnection... but there is code there 
(close method)

{code}
MessagingService.instance().resetVersion(from);
{code}

That unsets the (staticly scoped) version for an endpoint when closing... I 
would assume there could be overlapping connections for an endpoint, so this 
seems undesirable?

Also

{code}
MessagingService.instance().knowsVersion(endpoint) &&
MessagingService.instance().getRawVersion(endpoint) == 
MessagingService.current_version)
{code}

Since the endpoint->version mapping is static global and concurrent, we 
shouldn't be checking it twice

Also CASSANDRA-6700 changes

     public boolean knowsVersion(InetAddress endpoint)
     {
-        return versions.get(endpoint) != null;
+        return versions.containsKey(endpoint);
     }

However it is not clear that the map can ever contain a null value, and the 
getVersion() method still does the check the old way (versions.get(endpoint) != 
null)

In any case, I'm partly confused because I'm not quite sure how this endpoint 
version tracking is supposed to work, and the current state seems to have 
evolved as a result of lots of different issues (I don't think I've captured 
all of them here).

> Schema pushes (seemingly) randomly not happening
> ------------------------------------------------
>
>                 Key: CASSANDRA-7734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7734
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: graham sanderson
>
> We have been seeing problems since upgrade to 2.0.9 from 2.0.5.
> Basically after a while, schema changes start propagating slowly from some 
> nodes to others. It looks from the logs and trace that in this case the 
> "push" of the schema never happens (note a node has decided not to push to 
> another node, it doesn't seem to start again). In this case though, we do see 
> the other node end up pulling the request some time later when it notices its 
> schema is out of date.
> Here is code from 2.0.9 MigrationManager.announce
> {code}
>        for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
>         {
>             // only push schema to nodes with known and equal versions
>             if (!endpoint.equals(FBUtilities.getBroadcastAddress()) &&
>                     MessagingService.instance().knowsVersion(endpoint) &&
>                     MessagingService.instance().getRawVersion(endpoint) == 
> MessagingService.current_version)
>                 pushSchemaMutation(endpoint, schema);
>         }
> {code}
> and from 2.0.5
> {code}
>         for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
>         {
>             if (endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 continue; // we've dealt with localhost already
>             // don't send schema to the nodes with the versions older than 
> current major
>             if (MessagingService.instance().getVersion(endpoint) < 
> MessagingService.current_version)
>                 continue;
>             pushSchemaMutation(endpoint, schema);
>       }
> {code}
> the old getVersion() call would return MessagingService.current_version if 
> the version was unknown, so the push would occur in this case. I don't have 
> logging to prove this, but have strong suspicion that the version may end up 
> null in some cases (which would have allowed schema propagation in 2.0.5, but 
> not by somewhere after that and <= 2.0.9)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7734) Schema pushes (seemingly) randomly not happening

Reply via email to