[ 
https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-3578:
--------------------------------

    Attachment: oprate.svg
                latency.svg

A patch for this is available for review at 
[3578-2|https://github.com/belliottsmith/cassandra/tree/iss-3578-2]

Already discussed:
- Chained headers
- Ensures commits are persistent, using the suggested synchronisation scheme 
(read/write lock)

Further changes:
- Writes are completely non-blocking unless the CLE is behind or you're using 
Batch CLE
- On activating a new CLS, we trigger a sync() of the log; so now we sync() 
ever pollInterval elapsed, OR commit_log_segment_size_in_mb written, whichever 
condition is met first after the previous sync. This allows us to stay a little 
ahead of pollInterval, giving us some breathing room during "brief" spikes in 
write load in excess of what the disk can handle.
- Once we've completely written a CLS we immediately close/unmap the buffer
- On any drop keyspace or column family command, or on a node drain, we force 
the recycling of any CLS in use at the time of the call (this addresses 
CASSANDRA-5911. I included it in this ticket as it was easier to think about 
both at once)

Some implementation detail changes:
- We maintain a separate cfDirty and cfClean set now, which we merge on demand, 
to avoid allocating/deallocating AtomicIntegers all of the time
- We now reject row mutations that are only HALF the size of the CL, as opposed 
to equal in size - this is to stop burning through lots of CLS if we try to 
switch to a new segment but then are beaten to allocating the first item in it.

Some future work:
- Could reasonably easily have a guaranteed non-blocking CL.add method, which 
yields a Future if blocking becomes necessary; this could allow us to 
short-circuit the write-path a little to reduce latency in the majority of 
cases where blocking doesn't happen
- Compressed CL to improve IO
- Need to improve error handling in CL in general

Note, Vijay, that I briefly switched to a simpler blocking approach to 
switching in a new segment, as you suggested you preferred the simpler 
approach, but I decided to revert to non-blocking, due to potential future 
dividends with this guarantee.

I've attached two graphs to demonstrate the effect of this patch in a real 
4-node cluster. Note the latency graph has a logarithmic y-axis, so this patch 
looks to be an order of magnitude better at worst write latency measured; also 
variance in latency at the tail end is lower. This is also why there are fewer 
measurements, as the stderr of the measurements was smaller, so stress finished 
earlier. Also a roughly 12% increase in maximum throughput on this particular 
cluster.

> Multithreaded commitlog
> -----------------------
>
>                 Key: CASSANDRA-3578
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>            Priority: Minor
>              Labels: performance
>         Attachments: 0001-CASSANDRA-3578.patch, ComitlogStress.java, 
> Current-CL.png, Multi-Threded-CL.png, latency.svg, oprate.svg, 
> parallel_commit_log_2.patch
>
>
> Brian Aker pointed out a while ago that allowing multiple threads to modify 
> the commitlog simultaneously (reserving space for each with a CAS first, the 
> way we do in the SlabAllocator.Region.allocate) can improve performance, 
> since you're not bottlenecking on a single thread to do all the copying and 
> CRC computation.
> Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes 
> doable.
> (moved from CASSANDRA-622, which was getting a bit muddled.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to