[ 
https://issues.apache.org/jira/browse/CASSANDRA-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117606#comment-15117606
 ] 

Jim Witschey commented on CASSANDRA-10995:
------------------------------------------

[~benedict] Thanks for your comments.

bq. But only if the population is very small, since you would need for the data 
to occur multiple times on a single page.

I've been assuming that compression happened per-sstable -- am I wrong about 
that? Is this behavior documented somewhere?

bq. Realistically a dictionary generator should be added, which is not very 
hard, and was on my todo list for a long time. That or a weighted random byte 
generator, that is more likely to produce certain bytes (or byte sequences) 
than others, which would avoid the necessity of a dictionary while providing 
the same benefit.

Good idea. [~tjake]: If it's as simple as Benedict indicates, how soon could 
you put together a basic dictionary generator? If a usable version were on a 
branch somewhere in the next few days, it'd be useful for this benchmark. As 
discussed, though, I can work around it if you aren't able.

> Consider disabling sstable compression by default in 3.x
> --------------------------------------------------------
>
>                 Key: CASSANDRA-10995
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10995
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Jim Witschey
>
> With the new sstable format introduced in CASSANDRA-8099, it's very likely 
> that enabled sstable compression is no longer the right default option.
> [~slebresne]'s [blog post|http://www.datastax.com/2015/12/storage-engine-30] 
> on the new storage engine has some comparison numbers for 2.2/3.0, with and 
> without compression that show that in many cases compression no longer has a 
> significant effect on sstable sizes - all while sill consuming extra 
> resources for both writes (compression) and reads (decompression).
> We should run a comprehensive set of benchmarks to determine whether or not 
> compression should be switched to 'off' now in 3.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to