[
https://issues.apache.org/jira/browse/CASSANDRA-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806934#comment-13806934
]
Aleksey Yeschenko commented on CASSANDRA-6134:
----------------------------------------------
I don't like the idea of having two batchlog cfs, and two separate batchlog
implementations. But as Oleg says, there is a lot of room for improvement in
the current batchlog implementation.
I want to bring as much of them as possible w/out changing the schema (in
incompatible ways).
Regarding full scan - we can actually start using v1 uuid instead of random for
the batchlog keys, without changing the key type ('uuid' will accept any uuid
type, unlike 'timeuuid' that would only accept v1). And then stop replaying as
soon as we stumble upon a batch that is too new. (Can't exactly do that in 2.0,
but we can start using v1 ids in 2.0 and tell people to either force batchlog
replay or wait for a while on the fully upgraded 2.0 cluster before moving to
2.1, where we could start using this logic). We already require a stop at 2.0
for anyone upgrading to 2.1 so this should work.
> More efficient BatchlogManager
> ------------------------------
>
> Key: CASSANDRA-6134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Oleg Anastasyev
> Priority: Minor
> Attachments: BatchlogManager.txt
>
>
> As we discussed earlier in CASSANDRA-6079 this is the new BatchManager.
> It stores batch records in
> {code}
> CREATE TABLE batchlog (
> id_partition int,
> id timeuuid,
> data blob,
> PRIMARY KEY (id_partition, id)
> ) WITH COMPACT STORAGE AND
> CLUSTERING ORDER BY (id DESC)
> {code}
> where id_partition is minute-since-epoch of id uuid.
> So when it scans for batches to replay ot scans within a single partition for
> a slice of ids since last processed date till now minus write timeout.
> So no full batchlog CF scan and lot of randrom reads are made on normal
> cycle.
> Other improvements:
> 1. It runs every 1/2 of write timeout and replays all batches written within
> 0.9 * write timeout from now. This way we ensure, that batched updates will
> be replayed to th moment client times out from coordinator.
> 2. It submits all mutations from single batch in parallel (Like StorageProxy
> do). Old implementation played them one-by-one, so client can see half
> applied batches in CF for a long time (depending on size of batch).
> 3. It fixes a subtle racing bug with incorrect hint ttl calculation
--
This message was sent by Atlassian JIRA
(v6.1#6144)