[jira] [Commented] (CASSANDRA-9673) Improve batchlog write path

Stefania (JIRA) Mon, 24 Aug 2015 23:17:39 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710673#comment-14710673
 ]


Stefania commented on CASSANDRA-9673:
-------------------------------------

Here are the CI results (build #5 still pending).

Failing utests:

Build #4
* {{org.apache.cassandra.db.RecoveryManagerTest.testRecoverPITUnordered}} - 
timed out also on 3.0 (build #105)

Build #3:
* 
{{org.apache.cassandra.io.sstable.IndexSummaryManagerTest.testRedistributeSummaries-compression}}
 - timed out due to a compaction, seems unrelated

* {{org.apache.cassandra.cql3.MaterializedViewLongTest.testConflictResolution}} 
- this does worry me, sometimes it passes and sometimes it fails, it seems to 
always pass on 3.0. However it doesn't go through {{SP.mutateMV()}} so I don't 
see how we could have broken it, perhaps because we've removed  the dedicated 
stage? Sample failure 
[here|http://cassci.datastax.com/job/stef1927-9673-3.0-testall/3/testReport/org.apache.cassandra.cql3/MaterializedViewLongTest/testConflictResolution/].

The failing dtests seem inline with 3.0. Note that I have a fix for the two 
failing tests in batch_test.py 
[here|https://github.com/riptano/cassandra-dtest/pull/496/commits].

I also added a function ({{nanoSince()}}) to distinguish legacy mutations with 
clashing timestamps but it's very slow - do you think we need this?

{code}
    UUID newId = id;
    if (id.version() != 1 || timestamp != UUIDGen.unixTimestamp(id))
        newId = UUIDGen.getTimeUUID(timestamp, nanoSince(id, timestamp));
{code}

As far as I understand only 1.2 mutations would have non-time UUIDs, 
{{id.version() != 1}}, however strictly speaking time uuids would not 
necessarily match {{written_at}}, which is the current time in micros divided 
1000, whereas the uuid would have been created a little earlier, so if we 
crossed the millisecond boundary we would have {{timestamp != 
UUIDGen.unixTimestamp(id)}}. The code I am talking about is in 
{{LegacyBatchMigrator.apply}} and it is called when applying legacy mutations. 
WDYT?

> Improve batchlog write path
> ---------------------------
>
>                 Key: CASSANDRA-9673
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9673
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Stefania
>              Labels: performance
>             Fix For: 3.0 beta 2
>
>         Attachments: 9673_001.tar.gz, 9673_004.tar.gz, 
> gc_times_first_node_patched_004.png, gc_times_first_node_trunk_004.png
>
>
> Currently we allocate an on-heap {{ByteBuffer}} to serialize the batched 
> mutations into, before sending it to a distant node, generating unnecessary 
> garbage (potentially a lot of it).
> With materialized views using the batchlog, it would be nice to optimise the 
> write path:
> - introduce a new verb ({{Batch}})
> - introduce a new message ({{BatchMessage}}) that would encapsulate the 
> mutations, expiration, and creation time (similar to {{HintMessage}} in 
> CASSANDRA-6230)
> - have MS serialize it directly instead of relying on an intermediate buffer
> To avoid merely shifting the temp buffer to the receiving side(s) we should 
> change the structure of the batchlog table to use a list or a map of 
> individual mutations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9673) Improve batchlog write path

Reply via email to