[ 
https://issues.apache.org/jira/browse/CASSANDRA-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734844#comment-14734844
 ] 

Sylvain Lebresne commented on CASSANDRA-9761:
---------------------------------------------

Pushed a branch for this 
[here|https://github.com/pcmanus/cassandra/commits/9761]. I've resurrected 
(quite blindly) the "areAllNodesOn21" test we have in 2.1 (but it now tests for 
2.2) and the role manager uses that to decide if it should set itself up or 
not. If not, it re-schedule a try a bit later (it reuses 
{{cassandra.superuser_setup_delay_ms}} for that which imo is fine. The default 
is 10s but the {{areAllNodesOn22}} check is pretty cheap. Open to alternatives 
though if someone feels it's not adequate). This seems to work fine but I'm not 
all that familiar with the role manager so hopefully I haven't missed anything.

The results for the dtests are 
[here|http://cassci.datastax.com/job/pcmanus-9761-dtest/1/]. Got 2 weird and 
apparently unrelated failures on the unit tests however so rebased the branch 
to the last changes and re-running those, but the patch is ready for review 
otherwise.


> Delay auth setup until peers are upgraded
> -----------------------------------------
>
>                 Key: CASSANDRA-9761
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9761
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Tunnicliffe
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0.0 rc1, 2.2.2
>
>
> The built in auth classes {{CassandraRoleManager}} and 
> {{CassandraAuthorizer}} both attempt to do some setup and data conversion 
> when a node is upgraded to version 2.2 or higher. At the moment, each node 
> attempts the operations with the expectation that this will fail until enough 
> of the cluster has been upgraded for it to succeed (i.e. enough nodes have 
> the latest schema with the requisite new tables). These expected failures are 
> largely harmless, but they are annoying because they cause the receiving node 
> (the non-upgraded node) to close the connection with the upgraded node, which 
> then has to be restablished. Although this is the normal behaviour on schema 
> disagreement (see CASSANDRA-9136 for further discussion), it may be possible 
> to avoid in this specific circumstance. Given that we expect the operations 
> to fail until enough nodes are upgraded, we could defer them until we're sure 
> they can succeed by checking the messaging service version of peers. 
> Right now these are a one shot thing, each node only makes one attempt at the 
> conversion (until it is restarted). Without investigating further, I don't 
> know if we'd need to add in retries in case it takes a little time for each 
> peer's MS version to be updated as they're upgraded. The setup & conversion 
> operations are idempotent, so there shouldn't be a great issue if several 
> nodes  attempt them at the same time anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to