Benedict Elliott Smith created CASSANDRA-15369:
--------------------------------------------------

             Summary: Fake row deletions and range tombstones, causing digest 
mismatch and sstable growth
                 Key: CASSANDRA-15369
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
             Project: Cassandra
          Issue Type: Bug
          Components: Consistency/Coordination, Local/Memtable, Local/SSTable
            Reporter: Benedict Elliott Smith


As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
tombstone markers under various circumstances:
 * If we perform a clustering key query (or select a compact column):
 * Serving from a {{Memtable}}, we will generate fake row deletions
 * Serving from an sstable, we will generate fake row tombstone markers


 * If we perform a slice query, we will generate only fake row tombstone 
markers for any range tombstone that begins or ends outside of the limit of the 
requested slice
 * If we perform a multi-slice or IN query, this will occur for each 
slice/clustering

Unfortunately, these different behaviours can lead to very different data 
stored in sstables until a full repair is run.  When we read-repair, we only 
send these fake deletions or range tombstones.  A fake row deletion, clustering 
RT and slice RT, each produces a different digest.  So for each single point 
lookup we can produce a digest mismatch twice, and until a full repair is run 
we can encounter an unlimited number of digest mismatches across different 
overlapping queries.

Relatedly, this seems a more problematic variant of our atomicity failures 
caused by our monotonic reads, since RTs can have an atomic effect across (up 
to) the entire partition, whereas the propagation may happen on an arbitrarily 
small portion.  If the RT exists on only one node, this could plausibly lead to 
fairly problematic scenario if that node fails before the range can be 
repaired. 

At the very least, this behaviour can lead to an almost unlimited amount of 
extraneous data being stored until the range is repaired and compaction happens 
to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to