[ 
https://issues.apache.org/jira/browse/CASSANDRA-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068188#comment-13068188
 ] 

Sylvain Lebresne commented on CASSANDRA-2918:
---------------------------------------------

We do flush before computing the trees:
{noformat}
    private void doValidationCompaction(ColumnFamilyStore cfs, 
AntiEntropyService.Validator validator) throws IOException
    {
        // flush first so everyone is validating data that is as similar as 
possible
        try
        {
            StorageService.instance.forceTableFlush(cfs.table.name, 
cfs.getColumnFamilyName());
        }
{noformat}

However this is done badly, or more precisely, the problem here is due to 
CASSANDRA-2811. More precisely the part about:
{quote}
It turns out to also have a more subtle problem for repair itself. If two 
validation compaction for the same column family (but different range) are 
started in a very short time interval, the first validation will block on the 
flush, but the second one may not block at all if the memtable is clean when it 
request it's own flush. In which case that second validation will be executed 
on data older than it should.
{quote}

And because RF=3, we do start 3 validations for the same CF but different 
ranges at the same time. Which mean that 2 of them don't correctly wait the end 
of the flush and validate old data. This is fixed by CASSANDRA-2816 (I'll fix 
the tests asap so that we can commit it for 0.8.2).

> After repair, missing rows from query if you don't flush other replicas
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-2918
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2918
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.2
>         Environment: Cassandra-0.8 branch @ 07/18 around 1pm PST.
>            Reporter: Cathy Daw
>            Assignee: Sylvain Lebresne
>
> *Cluster Config*
> {code}
> cathy1  -  50.57.114.45 - Token: 0
> cathy2  -  50.57.107.176 - Token: 56713727820156410577229101238628035242
> cathy3  -  50.57.114.39 - Token: 113427455640312821154458202477256070484
> {code}
> *+2) Kill cathy3:  50.57.114.39+*
> {code}
> root@cathy2:~/cass-0.8/bin# ./nodetool -h localhost ring
> Address         DC          Rack        Status State   Load            Owns   
>  Token                                       
>                                                                               
>  113427455640312821154458202477256070484     
> 50.57.114.45    datacenter1 rack1       Up     Normal  59.84 KB        33.33% 
>  0                                           
> 50.57.107.176   datacenter1 rack1       Up     Normal  59.85 KB        33.33% 
>  56713727820156410577229101238628035242      
> 50.57.114.39    datacenter1 rack1       Down   Normal  59.85 KB        33.33% 
>  113427455640312821154458202477256070484     
> {code}
> *+3) Run java stress tool+*
> {code}
> ./bin/stress -o insert -n 1000 -c 10 -l 3 -e QUORUM -d 
> 50.57.114.45,50.57.107.176
> {code}
> *+4) Start Cassandra on cathy3+*
> *+5) Run repair on cathy3+*
> {code}
> nodetool -h cathy3 repair Keyspace1 Standard1
> {code}
> *+6) Kill cathy1 and cathy2+*
> {code}
> root@cathy3:~/cass-0.8/bin# ./nodetool -h cathy3 ring
> Address         DC          Rack        Status State   Load            Owns   
>  Token                                       
>                                                                               
>  113427455640312821154458202477256070484     
> 50.57.114.45    datacenter1 rack1       Down   Normal  105.46 KB       33.33% 
>  0                                           
> 50.57.107.176   datacenter1 rack1       Down   Normal  106 KB          33.33% 
>  56713727820156410577229101238628035242      
> 50.57.114.39    datacenter1 rack1       Up     Normal  331.33 KB       33.33% 
>  113427455640312821154458202477256070484 
> {code}
> *+7) Log into cassandra-cli on cathy3 - expect 1000 rows returned+*
> {code}
> [default@Keyspace1] consistencylevel as ONE;  
> Consistency level is set to 'ONE'.
> [default@Keyspace1] list Standard1 limit 2000;
> .....
> 323 Rows Returned.
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to