[
https://issues.apache.org/jira/browse/CASSANDRA-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712186#comment-14712186
]
Aleksey Yeschenko commented on CASSANDRA-9673:
----------------------------------------------
Pushed a bit more to [my
branch|https://github.com/iamaleksey/cassandra/commits/9673-3.0] on top of your
changes.
1. Makes it so that only {{BatchlogManager}} itself is aware of
{{system.batches}} table and does any writes to it (we already had
{{BatchlogManager::deleteBatch}} method, in fact)
2. Changes the codes so that we don't allocate a redundant {{ArrayList}} in
{{Batch}}, and so that we never mutate those collections (encoded/decoded) in
place
3. Removes (now) redundant extra schedule call at {{BatchlogManager}} startup
4. Makes it so that if we are dealing with an encoded (remote) batch, then the
mutations are always in the current messaging version format. Having the
version separate felt brittle.
5. Switches to vints for batch and hint encoding
One of the goals for me was overall consistency with hints code, since the two
are very related. After some coding, however, I realized that
{{BatchStoreMessage}} was indeed redundant. It carries no extra information
other than the {{Batch}} itself (unlike {{HintMessage}} that also carries the
host id). Symmetry with {{BatchRemoveMessage}} was nice, but the latter was
also completely redundant - it merely wraps the UUID and adds nothing. So I
ditched both classes, and suggest that we just marshal Batch/UUID instances,
raw.
I'm not done with the review yet, and don't have an answer to your uuid
question atm. Just wanted to push the latest.
> Improve batchlog write path
> ---------------------------
>
> Key: CASSANDRA-9673
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9673
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Aleksey Yeschenko
> Assignee: Stefania
> Labels: performance
> Fix For: 3.0 beta 2
>
> Attachments: 9673_001.tar.gz, 9673_004.tar.gz,
> gc_times_first_node_patched_004.png, gc_times_first_node_trunk_004.png
>
>
> Currently we allocate an on-heap {{ByteBuffer}} to serialize the batched
> mutations into, before sending it to a distant node, generating unnecessary
> garbage (potentially a lot of it).
> With materialized views using the batchlog, it would be nice to optimise the
> write path:
> - introduce a new verb ({{Batch}})
> - introduce a new message ({{BatchMessage}}) that would encapsulate the
> mutations, expiration, and creation time (similar to {{HintMessage}} in
> CASSANDRA-6230)
> - have MS serialize it directly instead of relying on an intermediate buffer
> To avoid merely shifting the temp buffer to the receiving side(s) we should
> change the structure of the batchlog table to use a list or a map of
> individual mutations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)