[ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140658#comment-14140658 ]
Branimir Lambov commented on CASSANDRA-6809: -------------------------------------------- Patch is available for review at https://github.com/blambov/cassandra/pull/2: {panel} Provides two implementations of commit log segments, one matching the previous memory-mapped writing method for uncompressed logs and one that uses in-memory buffers and compresses sections between sync markers before writing to the log. Replay is changed to decompress these sections and keep track of the uncompressed position to correctly identify the replay position. The compression class and parameters are specified in cassandra.yaml and stored in the commit log descriptor. Tested by the test-compression target, which now enables LZ4Compression of commit logs in addition to compression for SSTables. {panel} [~jasobrown]: Using a writer interface will probably be a little cleaner from a design point of view, but if we want to preserve all features of the current approach the two writing methods and the log segment class are so tightly coupled that it doesn't really matter. The measurements I did compared the number of commit log writes of a fixed size that one could perform in a given time period (the ComitLogStress test introduced in CASSANDRA-3578 and slightly updated here). Memory-mapped IO does seem to provide some benefit at least on Windows, which for me means we should not be removing it yet. > Compressed Commit Log > --------------------- > > Key: CASSANDRA-6809 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6809 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Assignee: Branimir Lambov > Priority: Minor > Labels: performance > Fix For: 3.0 > > Attachments: logtest.txt > > > It seems an unnecessary oversight that we don't compress the commit log. > Doing so should improve throughput, but some care will need to be taken to > ensure we use as much of a segment as possible. I propose decoupling the > writing of the records from the segments. Basically write into a (queue of) > DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X > MB written to the CL (where X is ordinarily CLS size), and then pack as many > of the compressed chunks into a CLS as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)