An eventually consistent approach to counting
---------------------------------------------

                 Key: CASSANDRA-1421
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1421
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Jonathan Ellis
             Fix For: 0.7.0


Counters may be implemented as multiple rows in a column family; that is, 
counters will have a configurable shard parameter; a shard factor of 128 would 
have 128 rows.

An increment will be a (uuid, count) name, value tuple.  The row shard will be 
uuid % shardfactor.  Timestamp is ignored.  This could be implemented w/ the 
existing Thrift write api, or we could add a special case method for it.  
Either is fine; the main advantage of the former is it lets increments be 
included in batch mutations.

(Decrements we get for free as simply negative values.)

Each node will be responsible for aggregating *the rows replicated to it* after 
GCGraceSeconds have elapsed.  Count aggregation will be a scheduled task on 
each machine.  This will require a mutex for each shard vs both writes and 
reads.

This will not have the conflict resolution problem of CASSANDRA-580, or the 
write fragility of CASSANDRA-1072.  Normal CL will apply on both read and 
write.  Write idempotentcy is preserved.  I expect writes will be faster than 
either, since no reads are required at all on the write path.  Reads will be 
slower, but the read overhead can be reduced by lowering GCGraceSeconds to 
below your repair frequency if you are okay with the durability tradeoff there 
(it will not be worse than CASSANDRA-1072, for instance).  More disk space will 
be used by this approach, but that is the cheapest resource we have.

Special case code required will be much less than either the 580 or 1072 
approach -- primarily some code in StorageProxy to combine the uuid slices with 
their aggregation columns and sum them for all the shards, the local 
aggregation code, and minor changes to read/write path to add the mutex vs 
aggregation.
 
We could also get rid of the Clock change and go back to i64 timestamps; if 
we're not going to use Clocks for increments I don't think they have much 
raison d'ĂȘtre.  (Those of you just joining us, see 
http://pl.atyp.us/wordpress/?p=2601 for background.)  The CASSANDRA-1072 
approach doesn't use Clocks either, or rather, it uses Clocks but not a byte[] 
value, which really means the Clock is unnecessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to