[ 
https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663994#comment-16663994
 ] 

Matt Byrd edited comment on CASSANDRA-11748 at 10/25/18 4:46 PM:
-----------------------------------------------------------------

I think it would be great to try and fix these related issues in the 4.0 
timeframe. I'd be keen on trying the above outlined approach, I'll have a go at 
sketching it out in a PR to see what folks think.
To reiterate what I believe to be fundamental problem:
The way we tee up a schema pull whenever a relevant gossip event shows a node 
with a different schema version,
results in far too many superfluous pulls for the same schema contents. When 
there are sufficient endpoints and a sufficiently large schema doing so can 
lead to the instance OOMing.

The above proposed solution solves this by decoupling the schema pulls from the 
incoming gossip messages and instead using gossip to update the nodes view of 
which other nodes have which schema version and then having a thread 
periodically check and attempt to resolve any inconsistencies.
There are some details to flesh out and I think an important part will be to 
ensure we have tests to demonstrate the issues and demonstrate we've fixed them.
I'm hoping that we can perhaps leverage 
[CASSANDRA-14821|https://issues.apache.org/jira/browse/CASSANDRA-14821] to do 
so. 
Though we may want to augment this with dtests or something else.
Let me know if you have any thoughts on the above approach, perhaps a sketch in 
code will help better illuminate it and help flush out potential problems. 
[~iamaleksey] / [[email protected]] / [~michael.fong] / [~jjirsa] 


was (Author: mbyrd):
I think it would be great to try and fix these related issues in the 4.0 
timeframe. I'd be keen on trying the above outlined approach, I'll have a go at 
sketching it out in a PR to see what folks think.
To reiterate what I believe to be fundamental problem:
The way we tee up a schema pull whenever a relevant gossip event shows a node 
with a different schema version,
results in far too many superfluous pulls for the same schema contents. When 
there are sufficient endpoints and a sufficiently large schema doing so can 
lead to the instance OOMing.

The above proposed solution solves this by decoupling the schema pulls from the 
incoming gossip messages and instead using gossip to update the nodes view of 
which other nodes have which schema version and then having a thread 
periodically check and attempt to resolve any inconsistencies.
There are some details to flesh out and I think an important part will be to 
ensure we have tests to demonstrate the issues and demonstrate we've fixed them.
I'm hoping that we can perhaps leverage 
[CASSANDRA-14821|https://issues.apache.org/jira/browse/CASSANDRA-14821] to do 
so. 
Though we may want to augment this with dtests or something else.
Let me know if you have any thoughts on the above approach, perhaps a sketch in 
code will help better illuminate it and help flush out potential problems. 
[~iamaleksey][[email protected]][~michael.fong][~jjirsa] 

> Schema version mismatch may leads to Casandra OOM at bootstrap during a 
> rolling upgrade process
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11748
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11748
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Rolling upgrade process from 1.2.19 to 2.0.17. 
> CentOS 6.6
> Occurred in different C* node of different scale of deployment (2G ~ 5G)
>            Reporter: Michael Fong
>            Assignee: Matt Byrd
>            Priority: Critical
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We have observed multiple times when a multi-node C* (v2.0.17) cluster ran 
> into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. 
> Here is the simple guideline of our rolling upgrade process
> 1. Update schema on a node, and wait until all nodes to be in schema version 
> agreemnt - via nodetool describeclulster
> 2. Restart a Cassandra node
> 3. After restart, there is a chance that the the restarted node has different 
> schema version.
> 4. All nodes in cluster start to rapidly exchange schema information, and any 
> of node could run into OOM. 
> The following is the system.log that occur in one of our 2-node cluster test 
> bed
> ----------------------------------
> Before rebooting node 2:
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> After rebooting node 2, 
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) 
> Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> The node2  keeps submitting the migration task over 100+ times to the other 
> node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node 
> /192.168.88.33 has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) 
> Updating topology for /192.168.88.33
> ...
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 
> 102) Submitting migration task for /192.168.88.33
> ... ( over 100+ times)
> ----------------------------------
> On the otherhand, Node 1 keeps updating its gossip information, followed by 
> receiving and submitting migrationTask afterwards: 
> INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 
> 978) InetAddress /192.168.88.34 is now UP
> ...
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 
> MigrationRequestVerbHandler.java (line 41) Received migration request from 
> /192.168.88.34.
> …… ( over 100+ times)
> DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 
> 127) submitting migration task for /192.168.88.34
> .....  (over 50+ times)
> On the side note, we have over 200+ column families defined in Cassandra 
> database, which may related to this amount of rpc traffic.
> P.S.2 The over requested schema migration task will eventually have 
> InternalResponseStage performing schema merge operation. Since this operation 
> requires a compaction for each merge and is much slower to consume. Thus, the 
> back-pressure of incoming schema migration content objects consumes all of 
> the heap space and ultimately ends up OOM!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to