[jira] [Commented] (CASSANDRA-4583) Some nodes forget schema when 1 node fails

Pavel Yaskevich (JIRA) Wed, 29 Aug 2012 10:23:10 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444224#comment-13444224
 ]


Pavel Yaskevich commented on CASSANDRA-4583:
--------------------------------------------

Additionally there are CASSANDRA-4432 and CASSANDRA-4561 related to timestamp 
problem
                
> Some nodes forget schema when 1 node fails
> ------------------------------------------
>
>                 Key: CASSANDRA-4583
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4583
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.2
>         Environment: CentOS release 6.3 (Final)
>            Reporter: Edward Sargisson
>         Attachments: cass-4583-2-system.log, cass-4583-5-system.log
>
>
> At present we do not have a complete reproduction for this defect but am 
> raising this defect as request by Aaron Morton. We will update as we find out 
> more. If any additional logging or tests are requested we will do them if we 
> can. 
> We have experienced 2 failures ascribed to this defect. On the cassandra user 
> mailing list Peter Schuller (2012-08-28) describes an additional failure.
> Reproduction steps as currently known:
> 1. Setup a cluster with 6 nodes (call them #1 through #6).
> 2. Have #5 fail completely. One failure was when the node was stopped to 
> replace the battery in the hard disk cache. The second failure was when the 
> hardware monitoring recorded a problem, CPU usage was increasing without 
> explanation and the server console was frozen so the machine was restarted.
> 3. Bring #5 back
> Expected behaviour:
> * #5 should rejoin the ring.
> Actual behaviour (based on the incident we saw yesterday):
> * #5 didn't rejoin the ring.
> * We stopped all nodes and started them one by one.
> * Nodes #2, #4, #6 had forgotten most of their column families. They had the 
> keys space but with only one column family instead of the usual 9 or so.
> * We ran nodetool resetlocalschema on #2, #4 and #6.
> * We ran nodetool repair -pr on #2, #4, #5 and #6
> * On #2 nodetool repair appeared to crash in that there were no messages in 
> the logs from it for 10min+. Nodetool compactionstats and nodetool netstats 
> showed no activity.
> * Restarting nodetool repair -pr fixed the problem and ran to completion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4583) Some nodes forget schema when 1 node fails

Reply via email to