[ 
https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802167#comment-13802167
 ] 

Tyler Hobbs edited comment on CASSANDRA-6109 at 10/22/13 7:33 PM:
------------------------------------------------------------------

bq. What if we just added a bucket filter that said, SSTables representing less 
than X% of the reads will not be bucketed?

To be clear, you're suggesting ignoring _buckets_ whose reads make up less than 
X% of the total reads/sec for the table, correct?

bq. Straightforward to tune and I can't think of any really pathological cases, 
other than where size-tiering just doesn't put hot overlapping sstables in the 
same bucket. 

This is definitely easier to tune.

One case I'm concerned about is where the max compaction threshold prevents a 
bucket from ever being above X% of the total reads/sec, especially with new, 
small SSTables.  If we compare reads _per key_ per second instead of just 
reads/sec, that case goes away.  Additionally, while comparing reads/sec would 
focus compactions on the largest SSTables, comparing reads per key per second 
would focus on compacting the hottest SSTables, which is an improvement.  With 
that change, I really like this strategy.

As far as the default threshold goes, I'll suggest a conservative 2 to 5%.  
Here's my thought process: there are usually roughly 5 tiers, so each tier 
should get about 20% of the total reads per key per second if all SSTables were 
equally hot.  Cold sstables should have below 10 to 25% of the normal read 
rates, giving a 2 to 5% threshold.


was (Author: thobbs):
bq. What if we just added a bucket filter that said, SSTables representing less 
than X% of the reads will not be bucketed?

To be clear, you're suggesting ignoring *buckets* whose reads make up less than 
X% of the total reads/sec for the table, correct?

bq. Straightforward to tune and I can't think of any really pathological cases, 
other than where size-tiering just doesn't put hot overlapping sstables in the 
same bucket. 

This is definitely easier to tune.

One case I'm concerned about is where the max compaction threshold prevents a 
bucket from ever being above X% of the total reads/sec, especially with new, 
small SSTables.  If we compare reads *per key* per second instead of just 
reads/sec, that case goes away.  Additionally, while comparing reads/sec would 
focus compactions on the largest SSTables, comparing reads per key per second 
would focus on compacting the hottest SSTables, which is an improvement.  With 
that change, I really like this strategy.

As far as the default threshold goes, I'll suggest a conservative 2 to 5%.  
Here's my thought process: there are usually roughly 5 tiers, so each tier 
should get about 20% of the total reads per key per second if all SSTables were 
equally hot.  Cold sstables should have below 10 to 25% of the normal read 
rates, giving a 2 to 5% threshold.

> Consider coldness in STCS compaction
> ------------------------------------
>
>                 Key: CASSANDRA-6109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.2
>
>         Attachments: 6109-v1.patch, 6109-v2.patch
>
>
> I see two options:
> # Don't compact cold sstables at all
> # Compact cold sstables only if there is nothing more important to compact
> The latter is better if you have cold data that may become hot again...  but 
> it's confusing if you have a workload such that you can't keep up with *all* 
> compaction, but you can keep up with hot sstable.  (Compaction backlog stat 
> becomes useless since we fall increasingly behind.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to