Sam Tunnicliffe created CASSANDRA-21455:
-------------------------------------------

             Summary: Unable to catch up TCM Log from peer with gaps in log 
sequence
                 Key: CASSANDRA-21455
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21455
             Project: Apache Cassandra
          Issue Type: Improvement
          Components: Transactional Cluster Metadata
            Reporter: Sam Tunnicliffe
            Assignee: Jon Meredith


The triggering scenario for this is a non-CMS peer receiving a snapshot when 
catching up from a peer/CMS node, leaving a gap near the tail of its local 
metadata log. If that node then subsequently receives catchup requests itself 
the heuristic that prioritises sending recent log entries over recent snapshots 
is counterproductive. 

A node receives log entries with epochs A, B, misses entries C, D, E and 
catches up from a peer, receiving snapshot at epoch F
Later, when another node asks this peer for its log since epoch B:
 1. listSnapshotsSince(B) returns exactly 1 snapshot (at epoch F)
 2. Since snapshotEpochs.size() <= 1, the code tries getEntries(B) - gets empty 
(or entries only up to B with nothing after)
 3. Empty entries are vacuously isContinuous() → returns LogState(null, []) 
without the snapshot
 4. The requesting node is stuck - it keeps getting empty responses and never 
advances past the gap

Proposed fix is in LogReader.getLogState. After confirming entries are 
continuous, check if there's a snapshot beyond the latest entry. If so, fall 
through to serve the snapshot instead.{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to