[
https://issues.apache.org/jira/browse/CASSANDRA-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444224#comment-13444224
]
Pavel Yaskevich commented on CASSANDRA-4583:
--------------------------------------------
Additionally there are CASSANDRA-4432 and CASSANDRA-4561 related to timestamp
problem
> Some nodes forget schema when 1 node fails
> ------------------------------------------
>
> Key: CASSANDRA-4583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4583
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.1.2
> Environment: CentOS release 6.3 (Final)
> Reporter: Edward Sargisson
> Attachments: cass-4583-2-system.log, cass-4583-5-system.log
>
>
> At present we do not have a complete reproduction for this defect but am
> raising this defect as request by Aaron Morton. We will update as we find out
> more. If any additional logging or tests are requested we will do them if we
> can.
> We have experienced 2 failures ascribed to this defect. On the cassandra user
> mailing list Peter Schuller (2012-08-28) describes an additional failure.
> Reproduction steps as currently known:
> 1. Setup a cluster with 6 nodes (call them #1 through #6).
> 2. Have #5 fail completely. One failure was when the node was stopped to
> replace the battery in the hard disk cache. The second failure was when the
> hardware monitoring recorded a problem, CPU usage was increasing without
> explanation and the server console was frozen so the machine was restarted.
> 3. Bring #5 back
> Expected behaviour:
> * #5 should rejoin the ring.
> Actual behaviour (based on the incident we saw yesterday):
> * #5 didn't rejoin the ring.
> * We stopped all nodes and started them one by one.
> * Nodes #2, #4, #6 had forgotten most of their column families. They had the
> keys space but with only one column family instead of the usual 9 or so.
> * We ran nodetool resetlocalschema on #2, #4 and #6.
> * We ran nodetool repair -pr on #2, #4, #5 and #6
> * On #2 nodetool repair appeared to crash in that there were no messages in
> the logs from it for 10min+. Nodetool compactionstats and nodetool netstats
> showed no activity.
> * Restarting nodetool repair -pr fixed the problem and ran to completion.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira