Range ghosts don't disappear as expected and accumulate
-------------------------------------------------------

                 Key: CASSANDRA-3748
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3748
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.3
         Environment: Cassandra on Debian 
            Reporter: Dominic Williams
             Fix For: 1.0.8


I have a problem where range ghosts are accumulating and cannot be removed by 
reducing GCSeconds and compacting.

In our system, we have some cfs that represent "markets" where each row 
represents an item. Once an item is sold, it is removed from the market by 
passing its key to remove().

The problem, which was hidden for some time by caching, is appearing on read. 
Every few seconds our system collates a random sample from each cf/market by 
choosing a random starting point:
String startKey = RNG.nextUUID())

and then loading a page range of rows, specifying the key range as:
KeyRange keyRange = new KeyRange(pageSize);
keyRange.setStart_key(startKey);
keyRange.setEnd_key(maxKey);

The returned rows are iterated over, and ghosts ignored. If insufficient rows 
are obtained, the process is repeated using the key of the last row as the 
starting key (or wrapping if necessary etc).

When performance was lagging, we did a test and found that constructing a 
random sample of 40 items (rows) involved iterating over hundreds of thousands 
of ghost rows. 

Our first attempt to deal with this was to halve our GCGraceSeconds and then 
perform major compactions. However, this had no effect on the number of ghost 
rows being returned. Furthermore, on examination it seems clear that the number 
of ghost rows being created within GCSeconds window must be smaller than the 
number being returned. Thus looks like a bug.

We are using Cassandra 1.0.3 with Sylain's patch from CASSANDRA-3510








--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to