[ 
https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275726#comment-14275726
 ] 

Ariel Weisberg edited comment on CASSANDRA-6809 at 1/13/15 7:26 PM:
--------------------------------------------------------------------

I finished my review. Comments are in the pull request. It looks good and could 
ship as is modulo some minor changes and if the the right tests are there (or 
get created). I have some thoughts about potential scope creep I would advocate 
for. Also some other directions for enhancement the commit log could go in as 
well as some reservations about performance in some cases. I only just noticed 
CommitLog stress so I need to check that out so I can understand the numbers 
and what is being tested.

RE CASSANDRA-7075  multiple CL disks. I see this as a work around for not 
having RAID-0 of the volumes being used for the CL and that is it. And that may 
introduce it's own load balancing issues as well as a mess of code for 
scattering/gathering mutations that I am less comfortable with. Writing a CL 
pipeline that can do the maximum supported sequential IO to a single file is 
doable, and if I had a choice it is what I would rather write. From a user 
perspective it is a nice feature to not to be forced to provide a RAID volume 
and to me that should be the primary motivation.

Also fascinating (to me) piece of trivia. When I tested in the past I could 
call force() on a mapped byte buffer far fewer times then I could call force() 
on a FileChannel. So if I had a battery backed disk controller and I appended a 
page (in a preallocated file) and called force() in a loop with a 
MappedByteBuffer it would do a few hundreds syncs a second, but with 
FileChannel.force it would do a few thousand. MBB was slow enough to be a 
concern for synchronous commits.



was (Author: aweisberg):
I finished my review. Comments are in the pull request. It looks good and could 
ship as is. I have some thoughts about potential scope creep I would advocate 
for. Also some other directions for enhancement the commit log could go in as 
well as some reservations about performance in some cases. I only just noticed 
CommitLog stress so I need to check that out so I can understand the numbers 
and what is being tested.

RE CASSANDRA-7075  multiple CL disks. I see this as a work around for not 
having RAID-0 of the volumes being used for the CL and that is it. And that may 
introduce it's own load balancing issues as well as a mess of code for 
scattering/gathering mutations that I am less comfortable with. Writing a CL 
pipeline that can do the maximum supported sequential IO to a single file is 
doable, and if I had a choice it is what I would rather write. From a user 
perspective it is a nice feature to not to be forced to provide a RAID volume 
and to me that should be the primary motivation.

Also fascinating (to me) piece of trivia. When I tested in the past I could 
call force() on a mapped byte buffer far fewer times then I could call force() 
on a FileChannel. So if I had a battery backed disk controller and I appended a 
page (in a preallocated file) and called force() in a loop with a 
MappedByteBuffer it would do a few hundreds syncs a second, but with 
FileChannel.force it would do a few thousand. MBB was slow enough to be a 
concern for synchronous commits.


> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: logtest.txt
>
>
> It seems an unnecessary oversight that we don't compress the commit log. 
> Doing so should improve throughput, but some care will need to be taken to 
> ensure we use as much of a segment as possible. I propose decoupling the 
> writing of the records from the segments. Basically write into a (queue of) 
> DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X 
> MB written to the CL (where X is ordinarily CLS size), and then pack as many 
> of the compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to