[ 
https://issues.apache.org/jira/browse/CASSANDRA-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413548#comment-17413548
 ] 

Aleksei Zotov commented on CASSANDRA-15153:
-------------------------------------------

I looked to the issue a bit further and can confirm there is a bug in the 
library. Basically the explanation is like that:
 - records are physically removed from the cache asynchronously
 - in order to not return stale records the library relies onto {{writeTime}} 
field
 - it checks whether a record is expired based on {{writeTime}} on every 
{{get}} call
 - if the record is expired it tries to load the actual value using 
{{CacheLoader}}
 - if the value is available, it is cached and returned; if not, the expired 
data is physically removed and {{null}} is returned; if an exception occurs, 
there is a problem

Basically it happens that the code updates {{writeTime}} on expiration 
([https://github.com/ben-manes/caffeine/blob/v2.3.5/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L2002])
 before checking whether it can load the actual value. And if while looking for 
the actual record 
([https://github.com/ben-manes/caffeine/blob/v2.3.5/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L2008])
 an exception occurs, the method is interrupted (meaning an exception is 
re-thrown to the upper level). However, the expired record has the 
{{writeTime}} updated! Basically the expired record resurrects.

As [~eperott] correctly mentioned, they re-wrote the whole expiration approach 
in [2.5.0|https://github.com/ben-manes/caffeine/releases/tag/v2.5.0] and the 
issue is probably fixed (I did not test it though). So the only fix is to 
update _Caffeine_ to a newer version. As I can see they went ahead and the 
current version is 3.0.3. I believe it makes sense to move to the latest 
version because it has a bunch of fixes and perf improvements.

[~blerer] [~mck] 

I have a could of questions:
 # It is clearly a minor bug. Currently expiration logic is used in 
{{AuthCache}} and {{ActiveRepairService}} classes. There are a few more classes 
that use _Caffeine_, but they probably won't be changed. In fact, the library 
upgrade seems to be hard to test since possible issues (in the library itself) 
can be probably re-produced in a concurrent environment and under a certain 
load only (aka production). So for me it is hard to asses the risk level of 
backporting it to the old versions. At the moment all versions starting from 
3.0 seem to be affected. With that being said, I'm wondering to what versions 
we need to apply the fix.
 # Are we good to go with the latest _Caffeine_ version? Alternatively, we can 
use a hybrid approach - update old versions to 2.5.0 (the first version where 
the problem seems to be fixed) and 4.1 to 3.0.3 (the latest version) - if that 
seems to be safer.

Please, share your thoughts.

> Caffeine cache return stale entries
> -----------------------------------
>
>                 Key: CASSANDRA-15153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15153
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/Authorization
>            Reporter: Per Otterström
>            Priority: Normal
>              Labels: security
>
> Version 2.3.5 of the Caffeine cache that we're using in various places can 
> hand out stale entries in some cases. This seem to happen when an update 
> fails repeatedly, in which case Caffeine may return a previously loaded 
> value. For instance, the AuthCache may hand out permissions even though the 
> reload operation is failing, see CASSANDRA-15041.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to