[jira] [Commented] (CASSANDRA-5178) Sometimes repair process doesn't work properly

Ryan McGuire (JIRA) Thu, 18 Apr 2013 10:42:13 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635389#comment-13635389
 ]


Ryan McGuire commented on CASSANDRA-5178:
-----------------------------------------

I'm trying to simply this process a bit from what you've described, so far I 
have not been able to reproduce this behaviour on 1.1.7. Here's my process so 
far:

Bring up 4 node cluster with two datacenters:
{code}
Address         DC          Rack        Status State   Load            Owns     
           Token                                       
                                                                                
           85070591730234615865843651857942052964      
192.168.1.141   dc1         r1          Up     Normal  11.13 KB        50.00%   
           0                                           
192.168.1.145   dc2         r1          Up     Normal  11.1 KB         0.00%    
           100                                         
192.168.1.143   dc1         r1          Up     Normal  11.11 KB        50.00%   
           85070591730234615865843651857942052864      
192.168.1.133   dc2         r1          Up     Normal  11.1 KB         0.00%    
           85070591730234615865843651857942052964      
{code}

Manually shutdown dc2.
{code}
Address         DC          Rack        Status State   Load            Owns     
           Token                                       
                                                                                
           85070591730234615865843651857942052964      
192.168.1.141   dc1         r1          Up     Normal  11.13 KB        50.00%   
           0                                           
192.168.1.145   dc2         r1          Down   Normal  15.53 KB        0.00%    
           100                                         
192.168.1.143   dc1         r1          Up     Normal  15.88 KB        50.00%   
           85070591730234615865843651857942052864      
192.168.1.133   dc2         r1          Down   Normal  15.53 KB        0.00%    
           85070591730234615865843651857942052964      
{code}

Create schema:
{code}
CREATE KEYSPACE ryan WITH strategy_class = 'NetworkTopologyStrategy' AND 
strategy_options:dc1 = '2';
CREATE TABLE ryan.test (n int primary key, x int);
{code}

Create data to import:
{code}
seq 500000 | sed 's/$/,1/' | split -l 250000 - data_
{code}

Write the first data set to dc1:
{code}
COPY ryan.test FROM 'data_aa';
{code}

Verify dc1 has all the data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 250000
{code}

Bring up dc2, then add it to the replication stategy:
{code}
ALTER KEYSPACE ryan WITH strategy_class = 'NetworkTopologyStrategy' AND 
strategy_options:dc1 = '2' AND strategy_options:dc2 = '2';
{code}

Verify dc2 has no data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 0
{code}

Verify dc1 has all the data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 250000
{code}

Write the second data set to dc1 with local_quorum consistency:
{code}
COPY ryan.test FROM 'data_ab';
{code}

{code}
Address         DC          Rack        Status State   Load            
Effective-Ownership Token
                                                                                
           85070591730234615865843651857942052964
192.168.1.141   dc1         r1          Up     Normal  12.39 MB        100.00%  
           0
192.168.1.145   dc2         r1          Up     Normal  6.33 MB         100.00%  
           100
192.168.1.143   dc1         r1          Up     Normal  12.72 MB        100.00%  
           85070591730234615865843651857942052864
192.168.1.133   dc2         r1          Up     Normal  6.33 MB         100.00%  
           85070591730234615865843651857942052964
{code}


Verify dc1 has all the data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 500000
{code}

Verify dc2 has only half the data written:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 250000
{code}

run repair from dc1:
{code}
nodetool repair
{code}
{code}
Address         DC          Rack        Status State   Load            
Effective-Ownership Token                                       
                                                                                
           85070591730234615865843651857942052964      
192.168.1.141   dc1         r1          Up     Normal  27.12 MB        100.00%  
           0                                           
192.168.1.145   dc2         r1          Up     Normal  22.78 MB        100.00%  
           100                                         
192.168.1.143   dc1         r1          Up     Normal  12.72 MB        100.00%  
           85070591730234615865843651857942052864      
192.168.1.133   dc2         r1          Up     Normal  16.44 MB        100.00%  
           85070591730234615865843651857942052964 
{code}

Verify that dc2 has all the data:
{code}
SELECT count(*) FROM ryan.test limit 99999999;
 count
--------
 500000
{code}

I'll try adding more nodes and settings to try to approximate your setup.
                
> Sometimes repair process doesn't work properly
> ----------------------------------------------
>
>                 Key: CASSANDRA-5178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5178
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.7
>            Reporter: Vladimir Barinov
>            Assignee: Ryan McGuire
>            Priority: Minor
>
> Pre-conditions:
> 1. We have two separate datacenters called "DC1" and "DC2" correspondingly. 
> Each of them contains of 6 nodes.
> 2. DC2 is disabled.
> 3. Tokens for DC1 are calculated via 
> https://raw.github.com/riptano/ComboAMI/2.2/tokentoolv2.py. Tokens for DC2 
> are the same as for DC1 but they have an offset: +100. So for token 0 in DC1 
> we'll have token 100 in DC2 and so on.
> 4. We have a test data set (1 billion keys).
> *Steps to reproduce:*
> *Step 1:*
> Lets check current configuration.
> nodetool ring:                                                                
>        
> {quote}
> {noformat}
>     <ip>     DC1         RAC1        Up     Normal  44,53 KB        33,33%    
>           0                                           
>     <ip>     DC1         RAC1        Up     Normal  51,8 KB         33,33%    
>           28356863910078205288614550619314017621      
>     <ip>     DC1         RAC1        Up     Normal  21,82 KB        33,33%    
>           56713727820156410577229101238628035242      
>     <ip>     DC1         RAC1        Up     Normal  21,82 KB        33,33%    
>           85070591730234615865843651857942052864      
>     <ip>     DC1         RAC1        Up     Normal  51,8 KB         33,33%    
>           113427455640312821154458202477256070485     
>     <ip>     DC1         RAC1        Up     Normal  21,82 KB        33,33%    
>           141784319550391026443072753096570088106  
> {noformat}   
> {quote}
> *Current schema:*
> {quote}
> {noformat}
>     create keyspace benchmarks
>       with placement_strategy = 'NetworkTopologyStrategy'
>       *and strategy_options = \{DC1 : 2};*
>     use benchmarks;
>     create column family test_family
>       with compaction_strategy = 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>       ... 
>       and compaction_strategy_options = \{'sstable_size_in_mb' : '20'}
>       and compression_options = \{'chunk_length_kb' : '32', 
> 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
> {noformat}
> {quote}
> *STEP 2:*
> Write first part of test data set (500 000 of keys) to DC1 with LOCAL_QUORUM 
> consistency level.
> *STEP 3:*
> Update cassandra.yaml,cassandra-topology.properties with new IP's from DC2 
> and current keyspace schema with *strategy_options = \{DC1 : 2, DC2 : 0};*
> *STEP 4:*
> Start all nodes from DC2.
> Check that nodes are started successfully:
> {quote}
> {noformat}
>     <ip>     DC1         RAC1        Up     Normal  11,4 MB         33,33%    
>           0                                           
>     <ip>     DC2         RAC2        Up     Normal  27,7 KB         0,00%     
>           100                                         
>     <ip>     DC1         RAC1        Up     Normal  11,34 MB        33,33%    
>           28356863910078205288614550619314017621      
>     <ip>     DC2         RAC2        Up     Normal  42,69 KB        0,00%     
>           28356863910078205288614550619314017721      
>     <ip>     DC1         RAC1        Up     Normal  11,37 MB        33,33%    
>           56713727820156410577229101238628035242      
>     <ip>     DC2         RAC2        Up     Normal  52,02 KB        0,00%     
>           56713727820156410577229101238628035342      
>     <ip>     DC1         RAC1        Up     Normal  11,4 MB         33,33%    
>           85070591730234615865843651857942052864      
>     <ip>     DC2         RAC2        Up     Normal  42,69 KB        0,00%     
>           85070591730234615865843651857942052964      
>     <ip>     DC1         RAC1        Up     Normal  11,43 MB        33,33%    
>           113427455640312821154458202477256070485     
>     <ip>     DC2         RAC2        Up     Normal  42,69 KB        0,00%     
>           113427455640312821154458202477256070585     
>     <ip>     DC1         RAC1        Up     Normal  11,39 MB        33,33%    
>           141784319550391026443072753096570088106     
>     <ip>     DC2         RAC2        Up     Normal  42,69 KB        0,00%     
>           141784319550391026443072753096570088206     
> {noformat}
> {quote}
> *STEP 5:*
> Update keyspace schema with *strategy_options = \{DC1 : 2, DC2 : 2};*
> STEP 6:
> Write last 500 000 keys of the test data set to DC1 with *LOCAL_QUORUM* 
> consistency level. 
> STEP 7:
> Check that first part of the test data set (first 500 000 of keys) was 
> written correct to DC1.
> Check that last part of the test data set (last 500 000 of keys) was written 
> correct to both datacenters.
> STEP 8:
> Run *nodetool repair* on each node of DC2 and wait until it's completed.
> STEP 9:
> Current nodetool ring:
> {quote}
> {noformat}
>     <ip>     DC1         RAC1        Up     Normal  21,45 MB        33,33%    
>           0                                           
>     <ip>     DC2         RAC2        Up     Normal  23,5 MB         33,33%    
>           100                                         
>     <ip>     DC1         RAC1        Up     Normal  20,67 MB        33,33%    
>           28356863910078205288614550619314017621      
>     <ip>     DC2         RAC2        Up     Normal  23,55 MB        33,33%    
>           28356863910078205288614550619314017721      
>     <ip>     DC1         RAC1        Up     Normal  21,18 MB        33,33%    
>           56713727820156410577229101238628035242      
>     <ip>     DC2         RAC2        Up     Normal  23,5 MB         33,33%    
>           56713727820156410577229101238628035342      
>     <ip>     DC1         RAC1        Up     Normal  23,5 MB         33,33%    
>           85070591730234615865843651857942052864      
>     <ip>     DC2         RAC2        Up     Normal  23,55 MB        33,33%    
>           85070591730234615865843651857942052964      
>     <ip>     DC1         RAC1        Up     Normal  21,44 MB        33,33%    
>           113427455640312821154458202477256070485     
>     <ip>     DC2         RAC2        Up     Normal  23,46 MB        33,33%    
>           113427455640312821154458202477256070585     
>     <ip>     DC1         RAC1        Up     Normal  20,53 MB        33,33%    
>           141784319550391026443072753096570088106     
>     <ip>     DC2         RAC2        Up     Normal  23,55 MB        33,33%    
>           141784319550391026443072753096570088206   
> {noformat}  
> {quote}
> Check that full test data set has been written to both datacenters.
> Resulit : 
> Full test data set was successfully written to DC1. *24448* of them are not 
> present on DC1.
> Repeating *nodetool repair* doesn’t help. 
> Result:
> It seems that problem is related with the process of identifying keys which 
> must be repaired when current datacenter already had some keys.
> If we start empty DC2 nodes after DC1 have got all 1 000 000  - *nodetool 
> repair*  works fine, without missing keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5178) Sometimes repair process doesn't work properly

Reply via email to