[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

Ben Manes (JIRA) Sun, 24 Apr 2016 14:15:26 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255723#comment-15255723
 ]


Ben Manes commented on CASSANDRA-11452:
---------------------------------------

Chatted with Roy and below is his analysis. I tried simulating his gratuitous 
access idea which had a positive impact on multi3, a negative impact on DS1 
(database), and negligible impact on all other traces. It seems he dislikes the 
probability approach in the code above, while I dislike the additional space of 
(6).

Longer term, I'd like to devise a way to adaptively size the admission window. 
I think if done right the strategy could help alleviate or resolve the attack. 
My current idea is to use a bloom filter that is reset on with small sample 
period, and captures the candidates that were rejected by TinyLFU. If multiple 
rejections for a key are observed in a sample, then it is admitted and the 
window size is increased. If not observed in a larger sample (e.g. 3x) then the 
window is decreased. This would follow some ARC-like equation with a min/max 
for the window. The admission of the candidate would resolve this attack, which 
would only result in making the window oversized for a workload.

----

Here is my summary.

Assumptions:

1) An oblivious adversary that does not see what all items in the cache are and 
does not control which items get in the cache; its only way if impacting what's 
in the cache is  by issuing requests for items by itself.

2) Caffeine uses the hash code of an object as its identifier for the frequency 
histogram management, so if two items have the same hash code, they will access 
the same locations in CMS/CBF since the respective hash functions will be 
computed on the objects' hash code.

Attack: The attacker can create two items O1 and O2 whose hash code is the 
same. It issues frequent requests for both objects until both are admitted to 
the cache. Once this happens, the attacker stops issuing requests for O2 but 
continues generating enough requests for O1 such that it remains one of the 
most frequent items. As O2 is no longer being requested, it gets demoted to the 
bottom of the LRU/SLRU eviction mechanism, so from some point on, it is always 
the chosen cache victim. However, due to the hash code collision and the high 
frequency count of O1, TinyLFU will never evict O2 and at that point the cache 
is completely nullified.
Notice, the attacker needs to construct two objects whose hash code will 
collide, or at least all hash values of the CMS will collide. In the other 
direction, it is not enough to find to values whose CMS hash values collide – 
one should create objects whose hash code will match these.

Possible solutions:
1. A domain specific solution: Use another identifier, e.g., in a youtube 
cache, one can use the video identifier, which google ensures is unique.
2. Java enhancement solution: One of the main problems with Java hash codes is 
that they are only 32 bits long. It may be possible to enhance it to 64 bits, 
but this is not realistic to expect.
3. Jitter: Instead of always replacing the least recently used item, pick one 
of the lowest k. The problem with this solution is that in order not to hurt 
performance, k should be kept small. However, in this case, the attacker can 
nullify 1/k of the with no additional effort, or the entire cache with k times 
the effort.
4. Random eviction: As the name suggests, use random eviction instead of LRU. 
The problem is that in some workloads, this would hurt performance.
5. Circular eviction: Each time evict the next position in the LRU queue rather 
than the last one. Has same impact as random, but faster to implement. Same as 
in random.
6. Ignore the filter on cache collisions: If there is a hash-code collision on 
the cache victim, we ignore TinyLFU. Impact on performance probably marginal 
since the chance for unintentional collision is small, and would definitely not 
show up in the traces. But, how do we detect this? Do we maintain a 
non-counting TinyTable for this storing the entire hash code? This adds space. 
On the other hand, the cache probably maintains in any case a map of all items 
– so we can build on this, in which case it simply costs another lookup. I 
would not worry about false positives here since in a naïve trace, the chances 
are very small, and azthe chances that both are heavy-hitters is negligible, so 
in the worst case, we will suffer one extra miss on the entire trace.
7. Count limit: If TinyLFU rejects replacement for k consecutive attempts, 
ignore TinyLFU. But what should K be? Also, incurs a small performance impact.
8. Probabilistically ignoring TinyLFU with some small probability. Same as 
above. Also, the attacker could repeat his trick each time his item gets 
evicted, hurting the performance.
9. Gratuitous access. If the cache victim wins too many times in the LFU filter 
test, we make a gratuitous access to this item. This way, it will take some 
time until it reaches the bottom of the LRU stack again. The important thing, 
we do not evict it, and so we do not pollute the cache with tail items unless 
the next item in line is also a tail item. Also, only requires a single counter 
so minimal space and time overheads.

Personally, I do not like any solution that hurts performance in the normal 
case just to solve this hypothetical attack.

Therefore, I vote for the "Ignore the filter on cache collisions" option, and 
even this, if possible, should be a parameter and not default. I also like the 
fact that it tackles the problem at its source rather an indirect hack. Maybe 
even here, perform the check only when the victim has not been evicted multiple 
times, to reduce the performance impact even further.

I also fine the gratuitous access interesting, but we need to check its impact 
on performance.

Of course, in applications of TinyLFY in which item ids are unique (e.g., 
youtube), use this id and not a hash code, so no problem.

Cheers,

Roy

> Cache implementation using LIRS eviction for in-process page cache
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-11452
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11452
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>
> Following up from CASSANDRA-5863, to make best use of caching and to avoid 
> having to explicitly marking compaction accesses as non-cacheable, we need a 
> cache implementation that uses an eviction algorithm that can better handle 
> non-recurring accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11452) Cache implementation using LIRS eviction for in-process page cache

Reply via email to