[ 
https://issues.apache.org/jira/browse/CASSANDRA-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13525993#comment-13525993
 ] 

Adrien Grand commented on CASSANDRA-5038:
-----------------------------------------

bq. Cool, yeah I'm not sure if we can use the "known size" decompressor, does 
it have to be exact or can it be upper bounded? We know from the block size the 
max compressed length.

It needs to be exact, or decompression will fail. An option to be able to use 
it is to write the original length as an int (or better as a variable-length 
int) before the compressed bytes. Upon decompression, first read the original 
length and then use this original length to call the "known size" decompressor.

bq.  I'd suggest you add a simple way for us to pick the best compressor for 
our node.

This is what the LZ4Factory#defaultInstance (I should probably rename it to 
fastestInstance) aims at doing but it only tries unsafe then safe right now. 
I'll try to add support for the native impl soon.

Another feature of these compressors you might be interested in is that you can 
provide them with an output buffer of any length and they will succeed only if 
they managed to generate an output which is small enough (and they will fail as 
soon as they know they won't make it). So for example, you could decide to 
write the raw bytes instead of the compressed bytes if LZ4 didn't manage to 
compress your data by more than 10%:

{code}
  final int maxAcceptableCompressedLength = originalLength * 90 / 100;
  try {
    dest[0] = 0; // means compressed
    final int compressedLength = compressor.compress(src, 0, originalLength, 
dest, 1, maxAcceptableCompressedLength);
    return 1 + compressedLength;
  } catch (LZ4Exception e) {
    dest[0] = 1; // means not compressed
    System.arraycopy(src, 0, dest, 1, originalLength);
    return 1 + originalLength;
  }
{code}
(Only the native LZ4 HC impl doesn't support this feature.)

                
> LZ4Compressor
> -------------
>
>                 Key: CASSANDRA-5038
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5038
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: T Jake Luciani
>            Priority: Minor
>             Fix For: 1.2.1
>
>         Attachments: LZ4Compressor.java, lz4-java.jar
>
>
> LZ4 is a new compression algo that's ~2x faster than Snappy.
> [~jpountz] has written a nice java port which includes a misc.Unsafe version 
> that performs >= than our java snappy version.
> Details at http://blog.jpountz.net/post/28092106032/wow-lz4-is-fast
> The nice thing is this should work with java7 and be more portable.
> We can also fallback the pure java impl

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to