[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

Tyler Hobbs (JIRA) Wed, 07 Aug 2013 12:07:30 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732588#comment-13732588
 ]


Tyler Hobbs commented on CASSANDRA-5519:
----------------------------------------

bq. We could define a fixed-size memory pool, similar to what we do for 
memtables or cache, and allocate it to the sstables proportional to their 
hotness.

It would be hard to describe this in text, so here's my pythonic psuedocode for 
distributing the fixed-size memory pool:

{noformat}
total_reads_per_sec = sum(sstable.reads_per_sec for sstable in sstables)
sstables_to_downsample = set()
leftover_entries = 0
for sstable in sstables:
    allocated_space = total_space * (sstable.reads_per_sec / 
total_reads_per_sec)
    num_entries = total_space / (SPACE_PER_ENTRY)  # space per entry = token + 
position + overhead
    if (num_entries > sstable.max_index_summary_entries):
        sstable.num_index_summary_entries = max_index_summary_entries
        leftover_entries = num_entries - sstable.max_index_summary_entries
    else
        sstable.num_index_summary_entries = num_entries
        sstables_to_downsample.add(sstable)

# distribute leftover_entries among sstables_to_downsample based on read rates
# (this probably ends up looking like a recursive or iterative function)
{noformat}

bq. Maybe we only rebuild the ones that are X% off of where they should be to 
make it lighter-weight.

That's a good idea. (I was thinking of using a step function.)  Instead of "X% 
off of where they should be", I would more precisely phrase that as "X% away 
from their previous proportion".

bq.  Or if we're downsampling by more than 2x then we can just resample what we 
already have in memory instead of rebuilding "correctly."

If you down-sample with a particular pattern, you can always down-sample using 
just the in-memory points; only up-samples need to read from disk.

I'm trying to generalize the down-sampling pattern, but the two main points are 
(assuming 1% granularity):
* For every 1% you down-sample, the number of points to remove from the 
in-memory summary is equal to 1% of the original (on-disk) count
* Each 1% down-sampling run starts at a different offset to evenly space the 
down-sampling

For example, to down-sample from 100% to 99%, you would remove every hundredth 
point, starting from index 0.  To down-sample from 99% to 98%, you would remove 
every 99th point, starting from index 50.  To down-sample from 98% to 97%, you 
would remove every 98th point, starting from index 24 or 74, and so on.
                
> Reduce index summary memory use for cold sstables
> -------------------------------------------------
>
>                 Key: CASSANDRA-5519
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 2.0.1
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

Reply via email to