[ 
https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231203#comment-14231203
 ] 

Benedict commented on CASSANDRA-7203:
-------------------------------------

I was _mostly_ hoping to get your and [~kohlisankalp]'s views on _if those 
workload skews occur_. Then we could at some point later get into the nitty 
gritty of if it would be worth it :-)

The idea wouldn't really be to special case anything except flush, and to 
depend on (and implement after) `improvements we have either envisaged or could 
later envisage to avoid compacting sstables with low predicted overlap of 
partitions. i.e. it would have the potential to improve the benefit of such 
schemes, by increasing the number of sstable pairings they can rule out.

> Flush (and Compact) High Traffic Partitions Separately
> ------------------------------------------------------
>
>                 Key: CASSANDRA-7203
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: compaction, performance
>
> An idea possibly worth exploring is the use of streaming count-min sketches 
> to collect data over the up-time of a server to estimating the velocity of 
> different partitions, so that high-volume partitions can be flushed 
> separately on the assumption that they will be much smaller in number, thus 
> reducing write amplification by permitting compaction independently of any 
> low-velocity data.
> Whilst the idea is reasonably straight forward, it seems that the biggest 
> problem here will be defining any success metric. Obviously any workload 
> following an exponential/zipf/extreme distribution is likely to benefit from 
> such an approach, but whether or not that would translate in real terms is 
> another matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to