[
https://issues.apache.org/jira/browse/CASSANDRA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Schuller resolved CASSANDRA-2015.
---------------------------------------
Resolution: Invalid
> Propagation of schema changes got out of sync with node's notion of ring
> ------------------------------------------------------------------------
>
> Key: CASSANDRA-2015
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2015
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Peter Schuller
>
> I have a test cluster of 0.7.0 of three nodes, 1, 2, 3. 1 and 2 are seeds
> (but not 3).
> I had a situation where the following was observed:
> * Schema changes submitted to node 1 would not propagate to any other node
> (observational method: tail syslog and don't see any flushing of system
> memtables/etc except on node 1).
> * Schema changes submitted to node 2 or 3 would propagate between them, or to
> all (not sure which).
> * Mutations submitted on node 1 *would* get propagated to node 3.
> * All nodes knew of each other and considered themselves up according to
> 'nodetool ring'.
> * Because node 3 never got schema migrations, writes submitted to node 1 that
> got sent to node 3 blocked for extended periods of time on node 1, while
> triggering an exception on now 3 because of an invalid cfid in the row
> mutation.
> * I can not be entirely sure whether just a regular restart would have fixed
> the problem.
> Unfortunately, I was not aware of the problem until running some unit tests
> against the cluster and I cannot say for sure which order the machines were
> bootstrapped in.
> After initial discovery I switched to manually submitting 'create keyspace
> x;' via cassandra-cli on each node (for different ks:es or interleaving
> create/drop), and observing results in syslog.
> The observations w.r.t. row mutations did not come from the manual test, but
> rather from the unit test that failed so there is some chance that there was
> a different mode of failure than during my cassandra-cli tests.
> Stopping all nodes and wiping data directories and restarting, fixed the
> problem and so far I have not been able to trigger it again. I am not sure
> whether just restarting the nodes would have fixed it.
> It definitely seems like a problem to me that schema changes did not
> propagate even though the node (1) node was apparently sufficiently aware of
> the other node (3) to sent mutations to it, even if the original problem may
> have been due to some kind of operational error.
> I'd be interested in hearing speculation of what likely triggers may be.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.