Re: AW: Strange nodetool repair behaviour

Jonas Borgström Tue, 05 Apr 2011 07:25:52 -0700

On 04/05/2011 03:49 PM, Jonathan Ellis wrote:
> Sounds like https://issues.apache.org/jira/browse/CASSANDRA-2324


Yes, that sounds like the issue I'm having. Any chance for a fix for
this being backported to 0.7.x?


Anyway, I guess I might as well share the test case I've used to
reproduce this problem:

============================================================
Cluster configuration: 6 nodes running 0.7.4 with RF=3

1. Create keyspace and column families (see repair_test.py (attached))

2. Insert 20 100MB keys into each of column family A, B and C:

$ python repair_test.py

This results in 2.4GB worth of sstables on node1:

$ du -sh /data/cassandra/data/repair_test3/
2.4G    /data/cassandra/data/repair_test3/

3. Run repair:

$ time nodetool -h node1 repair repair_test3
real    3m28.218s

The repair logged about streaming of 1 to 3 ranges for each column
family and the sstable directory was filled with a bunch of
"<column-family>-tmp-" files and disk usage peaked at 10+GB

The repair completed successfully and the disk usage is down to 6.4GB:

$ du -sh /data/cassandra/data/repair_test3/
6.4G    /data/cassandra/data/repair_test3/

4. Run repair again:

$ time nodetool -h node1 repair repair_test3
real    9m23.514s

This time the disk usage peaked at 25+GB and then settled at 4.7GB. This
time repair reported that even more ranges were out of sync.

So this issue seems to cause repair to take a very long time,
unnecessarily sending a lot of data over the network and leave a lot of
"air" in the resulting sstables that can only be recovered by triggering
major compactions.

(A GC was triggered before all disk usage measurements)
============================================================


Regards,
Jonas

import pycassa
"""
create keyspace repair_test3 with replication_factor=3;
use repair_test3;
create column family A with memtable_throughput=32;
create column family B with memtable_throughput=32;
create column family C with memtable_throughput=32;
"""

servers = ['node1:9160', 'node2:9160', 'node3:9160', 'node4:9160', 
'node5:9160', 'node6:9160']

def insert_data(cf_name):
    pool = pycassa.ConnectionPool('repair_test3', servers)
    cf = pycassa.ColumnFamily(pool, cf_name, 
write_consistency_level=pycassa.ConsistencyLevel.ONE)
    data = 'X' * 1024*1024
    for x in range(20):
        for y in range(100):
            print cf_name, x, y
            cf.insert(str(x), {str(y): data})

insert_data('A')
insert_data('B')
insert_data('C')

Re: AW: Strange nodetool repair behaviour

Reply via email to