[
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244140#comment-15244140
]
Benedict commented on CASSANDRA-11452:
--------------------------------------
bq. The hash table trick isn't applicable since I didn't fork it for Caffeine
or CLHM.
Ah; last time I looked (many years ago) your CLHM was a fork I'm sure; I hadn't
realised you'd reverted to stock. I'm not personally certain the trade-off is
so sharp, since the per-item overheads are costlier without forking as there is
no way to specialise the entries of a CHM, so we pay object overheads multiple
times. For small items this is likely costly. But no huge matter, and of
course there are significant downsides of either merging changes from upstream
or managing your own implementation of a concurrent hash map.
bq. For larger workloads (database, search, oltp) its equivalent, as we'd
expect. (multi1: -4%, multi3: +3%, gli: -2.5%, cs: -2%)
Just to clarify those numbers are for small workloads?
I'd personally say that a small net loss over small caches is not something to
worry about, and am personally against too many tuning knobs (which any such
value would probably need to be). However if we really cared we could do one
of two things:
# Make the admission based on min(victim, guard); i.e., only apply the guard
logic if the victim does not already permit entry. Assuming the problem in
small traces is occasionally hitting a _high_ frequency value in the guard
where the victim was not (which is the only thing that makes sense - the
alternative would be just blind unintended random coincidence in the trace,
which I don't think we can really account for), then this should have the same
effect as the threshold for triggering the random walk.
# we could gradate the introduction of the behaviour based on cache size. For
instance, we could divide the walk distance by the ratio of cache-size bits to
integer bits, i.e. if the cache contains 64K elements we would divide by 2, but
if it contains 512 elements we would divide by 4. This would mean that ~94%
would check the victim, and ~5% the next, and all but a tiny fraction the
third. This would also have the effect of slightly increasing protection as
caches got much larger and collisions more likely.
> Cache implementation using LIRS eviction for in-process page cache
> ------------------------------------------------------------------
>
> Key: CASSANDRA-11452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths
> Reporter: Branimir Lambov
> Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid
> having to explicitly marking compaction accesses as non-cacheable, we need a
> cache implementation that uses an eviction algorithm that can better handle
> non-recurring accesses.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)