[
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732588#comment-13732588
]
Tyler Hobbs commented on CASSANDRA-5519:
----------------------------------------
bq. We could define a fixed-size memory pool, similar to what we do for
memtables or cache, and allocate it to the sstables proportional to their
hotness.
It would be hard to describe this in text, so here's my pythonic psuedocode for
distributing the fixed-size memory pool:
{noformat}
total_reads_per_sec = sum(sstable.reads_per_sec for sstable in sstables)
sstables_to_downsample = set()
leftover_entries = 0
for sstable in sstables:
allocated_space = total_space * (sstable.reads_per_sec /
total_reads_per_sec)
num_entries = total_space / (SPACE_PER_ENTRY) # space per entry = token +
position + overhead
if (num_entries > sstable.max_index_summary_entries):
sstable.num_index_summary_entries = max_index_summary_entries
leftover_entries = num_entries - sstable.max_index_summary_entries
else
sstable.num_index_summary_entries = num_entries
sstables_to_downsample.add(sstable)
# distribute leftover_entries among sstables_to_downsample based on read rates
# (this probably ends up looking like a recursive or iterative function)
{noformat}
bq. Maybe we only rebuild the ones that are X% off of where they should be to
make it lighter-weight.
That's a good idea. (I was thinking of using a step function.) Instead of "X%
off of where they should be", I would more precisely phrase that as "X% away
from their previous proportion".
bq. Or if we're downsampling by more than 2x then we can just resample what we
already have in memory instead of rebuilding "correctly."
If you down-sample with a particular pattern, you can always down-sample using
just the in-memory points; only up-samples need to read from disk.
I'm trying to generalize the down-sampling pattern, but the two main points are
(assuming 1% granularity):
* For every 1% you down-sample, the number of points to remove from the
in-memory summary is equal to 1% of the original (on-disk) count
* Each 1% down-sampling run starts at a different offset to evenly space the
down-sampling
For example, to down-sample from 100% to 99%, you would remove every hundredth
point, starting from index 0. To down-sample from 99% to 98%, you would remove
every 99th point, starting from index 50. To down-sample from 98% to 97%, you
would remove every 98th point, starting from index 24 or 74, and so on.
> Reduce index summary memory use for cold sstables
> -------------------------------------------------
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Jonathan Ellis
> Priority: Minor
> Fix For: 2.0.1
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira