[ 
https://issues.apache.org/jira/browse/CASSANDRA-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712186#comment-14712186
 ] 

Aleksey Yeschenko commented on CASSANDRA-9673:
----------------------------------------------

Pushed a bit more to [my 
branch|https://github.com/iamaleksey/cassandra/commits/9673-3.0] on top of your 
changes.

1. Makes it so that only {{BatchlogManager}} itself is aware of 
{{system.batches}} table and does any writes to it (we already had 
{{BatchlogManager::deleteBatch}} method, in fact)
2. Changes the codes so that we don't allocate a redundant {{ArrayList}} in 
{{Batch}}, and so that we never mutate those collections (encoded/decoded) in 
place
3. Removes (now) redundant extra schedule call at {{BatchlogManager}} startup
4. Makes it so that if we are dealing with an encoded (remote) batch, then the 
mutations are always in the current messaging version format. Having the 
version separate felt brittle.
5. Switches to vints for batch and hint encoding

One of the goals for me was overall consistency with hints code, since the two 
are very related. After some coding, however, I realized that 
{{BatchStoreMessage}} was indeed redundant. It carries no extra information 
other than the {{Batch}} itself (unlike {{HintMessage}} that also carries the 
host id). Symmetry with {{BatchRemoveMessage}} was nice, but the latter was 
also completely redundant - it merely wraps the UUID and adds nothing. So I 
ditched both classes, and suggest that we just marshal Batch/UUID instances, 
raw.

I'm not done with the review yet, and don't have an answer to your uuid 
question atm. Just wanted to push the latest.

> Improve batchlog write path
> ---------------------------
>
>                 Key: CASSANDRA-9673
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9673
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Stefania
>              Labels: performance
>             Fix For: 3.0 beta 2
>
>         Attachments: 9673_001.tar.gz, 9673_004.tar.gz, 
> gc_times_first_node_patched_004.png, gc_times_first_node_trunk_004.png
>
>
> Currently we allocate an on-heap {{ByteBuffer}} to serialize the batched 
> mutations into, before sending it to a distant node, generating unnecessary 
> garbage (potentially a lot of it).
> With materialized views using the batchlog, it would be nice to optimise the 
> write path:
> - introduce a new verb ({{Batch}})
> - introduce a new message ({{BatchMessage}}) that would encapsulate the 
> mutations, expiration, and creation time (similar to {{HintMessage}} in 
> CASSANDRA-6230)
> - have MS serialize it directly instead of relying on an intermediate buffer
> To avoid merely shifting the temp buffer to the receiving side(s) we should 
> change the structure of the batchlog table to use a list or a map of 
> individual mutations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to