[ 
https://issues.apache.org/jira/browse/CASSANDRA-21455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-21455:
----------------------------------------
    Change Category: Operability
         Complexity: Normal
      Fix Version/s: 6.x
          Reviewers: Sam Tunnicliffe
             Status: Open  (was: Triage Needed)

I've combined the patches for CASSANDRA-21454, CASSANDRA-21455 & 
CASSANDRA-21456 and am running CI against that branch.

> Unable to catch up TCM Log from peer with gaps in log sequence
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-21455
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21455
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Transactional Cluster Metadata
>            Reporter: Sam Tunnicliffe
>            Assignee: Jon Meredith
>            Priority: Normal
>             Fix For: 6.x
>
>
> The triggering scenario for this is a non-CMS peer receiving a snapshot when 
> catching up from a peer/CMS node, leaving a gap near the tail of its local 
> metadata log. If that node then subsequently receives catchup requests itself 
> the heuristic that prioritises sending recent log entries over recent 
> snapshots is counterproductive. 
> A node receives log entries with epochs A, B, misses entries C, D, E and 
> catches up from a peer, receiving snapshot at epoch F
> Later, when another node asks this peer for its log since epoch B:
>  1. listSnapshotsSince(B) returns exactly 1 snapshot (at epoch F)
>  2. Since snapshotEpochs.size() <= 1, the code tries getEntries(B) - gets 
> empty (or entries only up to B with nothing after)
>  3. Empty entries are vacuously isContinuous() → returns LogState(null, []) 
> without the snapshot
>  4. The requesting node is stuck - it keeps getting empty responses and never 
> advances past the gap
> Proposed fix is in LogReader.getLogState. After confirming entries are 
> continuous, check if there's a snapshot beyond the latest entry. If so, fall 
> through to serve the snapshot instead.{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to