[
https://issues.apache.org/jira/browse/CASSANDRA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406221#comment-13406221
]
Jonathan Ellis edited comment on CASSANDRA-4285 at 7/4/12 1:23 AM:
-------------------------------------------------------------------
Here's the data model I'm leaning towards:
{code}
CREATE TABLE batchlog (
coordinator inet,
shard int,
id uuid,
data blob,
PRIMARY KEY ((coordinator, shard))
);
{code}
(Using CASSANDRA-4179 syntax for composite-partition-key.) As discussed in
CASSANDRA-1311, this is going to be a very tombstone-heavy CF since the
workload looks like
# insert batchlog entry
# replicate batch
# remove batchlog entry
So we're going to want to shard each coordinator's entries to avoid the
problems attendant to Very Wide Rows. Unlike most such workloads, we don't
actually need to time-order our entries; since batches are idempotent, replay
order won't matter. Thus, we can just pick a random shard id (in a known
range, say 0 to 63) to use for each entry, and on replay we will ready from
each shard.
Other notes:
- I think we can cheat in the replication strategy by knowing that part of the
partition key is the coordinator address, to avoid replicating to itself
- default RF will be 1; operators can increase if desired
- operators can also disable [local] commitlog on the batchlog CF, if desired
- gcgs can be safely set to zero in all cases; worst that happens is we replay
a write a second time which is not a problem
- Currently we always write tombstones to sstables in Memtable flush. Should
add a check for gcgs=0 to do an extra removeDeleted pass, which would make the
actual sstable contents for batchlog almost nothing (since the normal,
everything-is-working case will be that it gets deleted out while still in the
memtable).
- I think we do want to use inetaddr instead of node uuid as the coordinator id
here -- this gives us a replacement node (w/ the same IP) taking over for a
dead one "automatic" ownership of the dead node's batchlog.
was (Author: jbellis):
Here's the data model I'm leaning towards:
{code}
CREATE TABLE batchlog (
coordinator inet,
shard int,
id uuid,
data blob,
PRIMARY KEY ((coordinator, shard))
);
{code}
(Using CASSANDRA-4179 syntax for composite-partition-key.) As discussed in
CASSANDRA-1311, this is going to be a very tombstone-heavy CF since the
workload looks like
# insert batchlog entry
# replicate batch
# remove batchlog entry
So we're going to want to shard each coordinator's entries to avoid the
problems attendant to Very Wide Rows. Unlike most such workloads, we don't
actually need to time-order our entries; since batches are idempotent, replay
order won't matter. Thus, we can just pick a random shard id (in a known
range, say 0 to 63) to use for each entry, and on replay we will ready from
each shard.
Other notes:
- I think we can cheat in the replication strategy by knowing that part of the
partition key is the coordinator address, to avoid replicating to itself
- default RF will be 1; operators can increase if desired
- operators can also disable [local] commitlog on the batchlog CF, if desired
- gcgs can be safely set to zero in all cases; worst that happens is we replay
a write a second time which is not a problem
- Currently we always write tombstones to sstables in Memtable flush. Should
add a check for gcgs=0 to do an extra removeDeleted pass, which would make the
actual sstable contents for batchlog almost nothing (since the normal,
everything-is-working case will be that it gets deleted out while still in the
memtable).
> Atomic, eventually-consistent batches
> -------------------------------------
>
> Key: CASSANDRA-4285
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4285
> Project: Cassandra
> Issue Type: New Feature
> Components: API, Core
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
>
> I discussed this in the context of triggers (CASSANDRA-1311) but it's useful
> as a standalone feature as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira