[ 
https://issues.apache.org/jira/browse/CASSANDRA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller resolved CASSANDRA-2015.
---------------------------------------

    Resolution: Invalid

> Propagation of schema changes got out of sync with node's notion of ring
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2015
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2015
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>
> I have a test cluster of 0.7.0 of three nodes, 1, 2, 3. 1 and 2 are seeds 
> (but not 3).
> I had a situation where the following was observed:
> * Schema changes submitted to node 1 would not propagate to any other node 
> (observational method: tail syslog and don't see any flushing of system 
> memtables/etc except on node 1).
> * Schema changes submitted to node 2 or 3 would propagate between them, or to 
> all (not sure which).
> * Mutations submitted on node 1 *would* get propagated to node 3.
> * All nodes knew of each other and considered themselves up according to 
> 'nodetool ring'.
> * Because node 3 never got schema migrations, writes submitted to node 1 that 
> got sent to node 3 blocked for extended periods of time on node 1, while 
> triggering an exception on now 3 because of an invalid cfid in the row 
> mutation.
> * I can not be entirely sure whether just a regular restart would have fixed 
> the problem.
> Unfortunately, I was not aware of the problem until running some unit tests 
> against the cluster and I cannot say for sure which order the machines were 
> bootstrapped in.
> After initial discovery I switched to manually submitting 'create keyspace 
> x;' via cassandra-cli on each node (for different ks:es or interleaving 
> create/drop), and observing results in syslog.
> The observations w.r.t. row mutations did not come from the manual test, but 
> rather from the unit test that failed so there is some chance that there was 
> a different mode of failure than during my cassandra-cli tests.
> Stopping all nodes and wiping data directories and restarting, fixed the 
> problem and so far I have not been able to trigger it again. I am not sure 
> whether just restarting the nodes would have fixed it.
> It definitely seems like a problem to me that schema changes did not 
> propagate even though the node (1) node was apparently sufficiently aware of 
> the other node (3) to sent mutations to it, even if the original problem may 
> have been due to some kind of operational error.
> I'd be interested in hearing speculation of what likely triggers may be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to