[jira] [Comment Edited] (CASSANDRA-4285) Atomic, eventually-consistent batches

Jonathan Ellis (JIRA) Tue, 03 Jul 2012 18:23:37 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406221#comment-13406221
 ]


Jonathan Ellis edited comment on CASSANDRA-4285 at 7/4/12 1:23 AM:
-------------------------------------------------------------------

Here's the data model I'm leaning towards:

{code}
CREATE TABLE batchlog (
  coordinator inet,
  shard       int,
  id          uuid,
  data        blob,
  PRIMARY KEY ((coordinator, shard))
);
{code}

(Using CASSANDRA-4179 syntax for composite-partition-key.)  As discussed in 
CASSANDRA-1311, this is going to be a very tombstone-heavy CF since the 
workload looks like

# insert batchlog entry
# replicate batch
# remove batchlog entry

So we're going to want to shard each coordinator's entries to avoid the 
problems attendant to Very Wide Rows.  Unlike most such workloads, we don't 
actually need to time-order our entries; since batches are idempotent, replay 
order won't matter.  Thus, we can just pick a random shard id (in a known 
range, say 0 to 63) to use for each entry, and on replay we will ready from 
each shard.

Other notes:
- I think we can cheat in the replication strategy by knowing that part of the 
partition key is the coordinator address, to avoid replicating to itself
- default RF will be 1; operators can increase if desired
- operators can also disable [local] commitlog on the batchlog CF, if desired
- gcgs can be safely set to zero in all cases; worst that happens is we replay 
a write a second time which is not a problem
- Currently we always write tombstones to sstables in Memtable flush.  Should 
add a check for gcgs=0 to do an extra removeDeleted pass, which would make the 
actual sstable contents for batchlog almost nothing (since the normal, 
everything-is-working case will be that it gets deleted out while still in the 
memtable).
- I think we do want to use inetaddr instead of node uuid as the coordinator id 
here -- this gives us a replacement node (w/ the same IP) taking over for a 
dead one "automatic" ownership of the dead node's batchlog.
                
      was (Author: jbellis):
    Here's the data model I'm leaning towards:

{code}
CREATE TABLE batchlog (
  coordinator inet,
  shard       int,
  id          uuid,
  data        blob,
  PRIMARY KEY ((coordinator, shard))
);
{code}

(Using CASSANDRA-4179 syntax for composite-partition-key.)  As discussed in 
CASSANDRA-1311, this is going to be a very tombstone-heavy CF since the 
workload looks like

# insert batchlog entry
# replicate batch
# remove batchlog entry

So we're going to want to shard each coordinator's entries to avoid the 
problems attendant to Very Wide Rows.  Unlike most such workloads, we don't 
actually need to time-order our entries; since batches are idempotent, replay 
order won't matter.  Thus, we can just pick a random shard id (in a known 
range, say 0 to 63) to use for each entry, and on replay we will ready from 
each shard.

Other notes:
- I think we can cheat in the replication strategy by knowing that part of the 
partition key is the coordinator address, to avoid replicating to itself
- default RF will be 1; operators can increase if desired
- operators can also disable [local] commitlog on the batchlog CF, if desired
- gcgs can be safely set to zero in all cases; worst that happens is we replay 
a write a second time which is not a problem
- Currently we always write tombstones to sstables in Memtable flush.  Should 
add a check for gcgs=0 to do an extra removeDeleted pass, which would make the 
actual sstable contents for batchlog almost nothing (since the normal, 
everything-is-working case will be that it gets deleted out while still in the 
memtable).
                  
> Atomic, eventually-consistent batches
> -------------------------------------
>
>                 Key: CASSANDRA-4285
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4285
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>
> I discussed this in the context of triggers (CASSANDRA-1311) but it's useful 
> as a standalone feature as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4285) Atomic, eventually-consistent batches

Reply via email to