[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866295#action_12866295
 ] 

Kazuki Ohta commented on CASSANDRA-47:
--------------------------------------

Just a comment. SSTable compression is very useful for storing large web pages. 
By using order-preserving hash, we can store the web pages of the same domain, 
maybe in the same SSTable.

At this time, the vcdiff algorithm (Bentley-McIlroy 99 Scheme) can effectively 
compress the longest common strings. Currently, many web pages are constructed 
by using the same templates, so this algorithm is able to eliminate the 
template part and remain only the content part. I've blogged about this 
algorithm.

- http://kzk9.net/b/2010/02/vcdiff-data-compression-using-long-common-strings/

 I think this will open up the huge opportunities for cassandra. Even in a 
single block, this will work fine. If the compression becomes pluggable, I want 
to implement this algorithm part.


> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Eric Evans
>            Priority: Minor
>             Fix For: 0.8
>
>
> We should be able to do SSTable compression which would trade CPU for I/O 
> (almost always a good trade).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to