[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog

Commented Sun, 29 Jan 2012 17:15:34 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195902#comment-13195902
 ]


Michaël Figuière commented on CASSANDRA-3578:
---------------------------------------------

I propose a different approach than Piotr in this patch. In this 
implementation, we only have one thread to handle syncs, all the processing, 
that is serialization, CRC and copying the RM into the mmap segment is handled 
directly in the writer threads. These threads exchange data with the syncer 
thread in a non blocking way, thus the ExecutorService abstraction has been 
replaced by a lighter structure.
Several components of the CL presented some challenges to implement in such a 
manner:


*CL Segment switch*

Switching CL segment when it's full isn't straightforward without locks. Here 
we use a boolean mark the is atomically CASed by a writer thread giving him the 
responsibility for performing the switch. If the mark can't be grabbed, the 
thread is waiting on a condition which is later reused using stamps to avoid 
any ABA problem.


*Batch CL*

The Batch CL strategy is considered as a safer mode for Cassandra as it 
guarantee the client that the RM is synced on disk before answering. Making the 
CL multithreaded, we must ensure that we don't acknowledge a RM that is synced 
on disk but preceded by an unsynced RM in the CL Segment as it would make the 
replaying of the RM impossible. For this reason, we track the state of each RM 
processing, and mark as synced any continuous set of RM fully written when the 
sync() call is executed.

Avoiding any blocking queue, we still need a way to put the writer threads on 
hold while the sync is being ensured. LockSupport.park()/unpark() provides a 
nice way the do it without relying on any coarse grain synchronization and 
avoiding any condition reuse/renewing issue.


*Periodic CL*

The Periodic CL's challenge is mostly around the throttling of the writers as 
here again we don't use any synchronized queue to reduce contention. Actually 
here we just need "half a blocking queue" as nothing is really added or 
consumed. For this reason, here we just use an atomic counter and a empty/full 
condition couple. Here again, a pool of conditions and a stamp are used to 
avoid the ABA problem.

 
*End of Segment marker*

Another point is that this implementation don't use any End of Segment marker. 
As we now have several concurrent writers, it's not possible anymore to write 
temporary marker after an entry. That mean that the recently committed code 
that fix CASSANDRA-3615 is obviously not included in this patch.

Nevertheless, a mechanism to avoid unwanted replay of entry from recycled 
segment is still required. I haven't included it in the patch as I think it's a 
design choice that need to be debated but that seem straightforward to 
implement. The options I can see are the following:
- Fill CL segment file with 0 on recycling. Doing so avoid any problem but will 
typically require a several second write on recycling that will lead to write 
latency hiccup.
- Include segment id in every entry. This avoid any problem as well but 
increase the entry size by 8 bytes which has a cost but isn't a drama and can't 
be considered as spreading the cost of the previous option over the entire CL 
writing.
- Salting the two checksums included in the entry with the segment id. Doing so 
lowers the probability of any unwanted replay to happen to a level that seems 
fairly acceptable. The advantage of this solution is that its performance cost 
is null.



Finally, here are some noteworthy observations:
* Here the writer thread WAITS for the processing to complete. Compared to a 
_push-on-queue-and-forget_ approach, this slightly increases write latency when 
using the Periodic CL (the Batch CL still being synchronous) especially for 
large RMs. Nevertheless, in a highly loaded server, the next writes waiting to 
be executed would have to wait anyway for their thread to be scheduled, thus 
the latency cost might eventually be paid. Increasing the number of writer 
thread should help to increase the insensitiveness of the small RMs to the 
large RMs.
* If extensive benchmarks tend to show that the previous point is an issue, 
there's some room to make this Periodic CL asynchronous with the writer threads.
* To reduce as much as possible the contention on the atomic states that can be 
modified several time by each thread, some naughty packing of several states 
within a single AtomicLong is used as it decreases the likeliness of an extra 
spin to happen compare to a more classical AtomicReference approach to 
non-blocking synchronization. The downside is code complexity, thus I think 
AtomicReference still stay an option to make the code more readable and 
maintainable. 
* Actually for now to ensure the required throttling of incoming RM we use a 
constant function with a fixed threshold of unsynced mutation. But we now have 
the tools to easily make the function more complex, like making it non constant 
and including some relation to the size of the mutations for instance.

                
> Multithreaded commitlog
> -----------------------
>
>                 Key: CASSANDRA-3578
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3578
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Priority: Minor
>         Attachments: parallel_commit_log_2.patch
>
>
> Brian Aker pointed out a while ago that allowing multiple threads to modify 
> the commitlog simultaneously (reserving space for each with a CAS first, the 
> way we do in the SlabAllocator.Region.allocate) can improve performance, 
> since you're not bottlenecking on a single thread to do all the copying and 
> CRC computation.
> Now that we use mmap'd CommitLog segments (CASSANDRA-3411) this becomes 
> doable.
> (moved from CASSANDRA-622, which was getting a bit muddled.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3578) Multithreaded commitlog

Reply via email to