[
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062381#comment-13062381
]
Pavel Yaskevich commented on CASSANDRA-47:
------------------------------------------
bq. This seems like an unrealistically good compression ratio. If I gzip a real
world SSTable that has redundant data that should be ripe for compression I
only see 641M-->217M. What's the gzip compression ratio with the SSTables that
stress.java workload generates?
You can easily test it yourself: for example ./bin/stress -S 1024 -n 1000000 -C
250 -V wait for compactions to finish and check block size of the resulting
files (using ls -lahs), I see 3.8GB compressed into 781MB in my tests.
internal_op_rate with the current trunk code is around 450-500 but with current
patch it is about 2800-3000 on Quad-Core AMD Opteron(tm) Processor 2374 HE
4229730MHz on each core, 2GB mem (rackspace instance). cardinality of 250 is 5
times bigger that default + average size values using -V option.
> SSTable compression
> -------------------
>
> Key: CASSANDRA-47
> URL: https://issues.apache.org/jira/browse/CASSANDRA-47
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Pavel Yaskevich
> Labels: compression
> Fix For: 1.0
>
> Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O
> (almost always a good trade).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira