[ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244390#comment-15244390
 ] 

Benedict commented on CASSANDRA-11452:
--------------------------------------

bq.  though a rough calculation indicates it isn't a huge savings

Assuming a basic CLHM, with CompressedOops on a 64-bit VM (Cassandra's 
defaults) I calculate overhead inflation of around 22% - I reckon 72 bytes are 
needed vs a possible 56 (once the 12 byte overheads are removed, and alignment 
accounted for).  You'd also be able to avoid recalculating the hash for the 
sketches since its memoized in CHM.  Admittedly I don't 100% vouch for the 
accuracy of those calculations as I'm doing it from memory.

I absolutely am not suggesting your calculation of cost/benefit is wrong 
though, or that I would even have arrived at a different conclusion.  Certainly 
the user key/value sizes further amortize that overhead inflation, and for many 
workloads the distinction is barely perceptible.

bq. What do you think about combining the approach

I assume you mean the inversion of that guard.  It's a shame we don't have 
access to the CHM to do the sampling, as that would make it robust to scans 
since all the members of the LRU would have high frequencies.  My only slight 
concern is that we may have to wait 10s of thousands of rejections to cycle out 
the collision, which is quite slow to respond. By raising the chance we harm 
scans though.  A couple of other options:

# Randomly sample the frequency of, say, 1% of the items we admit (on 
admission, storing the last 16 or so), on demand compute the low quartile
# On demand, sample a random short run of the sketch when we encounter this 
situation, compute some percentile (need some thought for which)

Then either for 1% of admissions, or when your current guard is triggered, 
compute this statistic for the guard.  For absolute security, for say 0.01% of 
candidates, admit without any check.

That all said, I expect for Cassandra's purposes many of the proposed solutions 
so far will be sufficient, and I certainly wouldn't have any problem with the 
solution you propose.

> Cache implementation using LIRS eviction for in-process page cache
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-11452
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to