Huapeng Yuan created CASSANDRA-18433:
----------------------------------------

             Summary: Row cache inconsistency issue: A read can put stale data 
into row cache in a race condition
                 Key: CASSANDRA-18433
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18433
             Project: Cassandra
          Issue Type: Bug
          Components: Local/Caching
            Reporter: Huapeng Yuan


We found the issue in our production system which has the version 3.11.6.  We 
did a update and then read immediately, we may read the stale data sometimes.  
Same issue for  writeAll + readOne consistency and writeQuorm+readQuorum. The 
issue is gone once we disabled the row cache.

The config for row cache: 

caching = \{'keys': 'ALL', 'rows_per_partition': 'ALL'}

 

After some investigations, we think there is a race condition during read/write 
path. Problems:

When two threads are reading and writing the same partition (for example, two 
rows with same partition key), the read thread may load the stale data into row 
cache for the row which is being updated.
{{}}
{panel:title=The steps of write-thread inserting a row to partition p}
{{W-Step }}{{{}1{}}}{{{}: inserts the value v1 to memtable.{}}}
{{W-Step }}{{{}2{}}}{{{}: invalidates the row cache using partition 
key.{}}}{panel}
{{}}
{panel:title=The steps of read-thread reading a row from partition p}
{{R-Step }}{{{}1{}}}{{{}: Checks row cache and finds whether the row is not 
present in cache. If not, goes to '{}}}{{{}R-Step {}}}{{{}2'{}}}{{{}.{}}}
{{R-Step }}{{{}2{}}}{{{}: Insert a sentinel (timestamp) as the row value into 
row cache to tell other read threads should skip the row cache.{}}}
{{R-Step }}{{{}3{}}}{{{}: Read from storage layer and get value v0 which can be 
older than v1.{}}}
{{R-Step }}{{{}4{}}}{{{}: Insert v0 to row cache {}}}{{for}} {{the row by 
checking }}{{if}} {{the row doesn't exist or it has the same sentinel. *The 
inconsistency is caused by this step. Should not insert the stale value if the 
sentinel doesn't exist in row cache any more.*}}{panel}
{{}}
{panel:title=The sequence to reproduce the issue}
{{R-Step }}{{1}}
{{R-Step }}{{2}}
{{R-Step }}{{3}}
{{W-Step }}{{1}}
{{W-Step }}{{2}}
{{R-Step }}{{4}}{panel}
{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to