[ 
https://issues.apache.org/jira/browse/CASSANDRA-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-3001:
----------------------------------------

    Attachment: 0002-Add-deflate-compressor.patch
                0001-Pluggable-algorithm-and-chunk-length.patch

Attaching patch to make the compression algorithm configurable, as well as the 
chunk length. It implements the idea of having compression be "similar" to the 
compaction strategies as far as thrift is concerned.

Talking of the chunk length, its default value is 65535, which is 64k-1, not 
64k. I think this is problem because of the following line in 
CRAR.decompressChunk:
{noformat}
        // buffer offset is always aligned
        bufferOffset = current & ~(buffer.length - 1);
{noformat}
which I believe only works if buffer.length is a power of 2 (which 64k-1 is 
not). We should either change this line or enforce that the chunk length is a 
power of two. The attached patch choose the second solution, enforcing a power 
of 2 length (and thus set the default chunk to 65536).

The second attached patch adds a compressor based on Java deflate default 
implementation. Sadly, I haven't found a way to compute in advance what is the 
max size a piece of compressed data can take (that is, an equivalent to 
Snappy.maxCompressedLength()), so the patch does slightly modify the 
ICompressor interface to allow the compression function to resize the buffer if 
need be. This is arguably not very elegant, though it works. Besides, I haven't 
really made any true benchmarks, but given the time it takes to compact the 
result of a default stress session, this sound sloooooow (but it does result in 
non-negligibly smaller files than Snappy). Don't know if we want to commit that 
part: felt reasonable to try it at the very least.


> Make the compression algorithm and chunk length configurable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-3001
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3001
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: 0001-Pluggable-algorithm-and-chunk-length.patch, 
> 0002-Add-deflate-compressor.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to