[jira] [Created] (CASSANDRA-20172) Accord: Fix various bugs, improve burn test reliability

Benedict Elliott Smith (Jira) Sun, 29 Dec 2024 04:12:45 -0800

Benedict Elliott Smith created CASSANDRA-20172:
--------------------------------------------------


             Summary: Accord: Fix various bugs, improve burn test reliability
                 Key: CASSANDRA-20172
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20172
             Project: Apache Cassandra
          Issue Type: Bug
          Components: Accord
            Reporter: Benedict Elliott Smith


     - Fix notifying unmanaged after update redundant before/bootstrap
     - Do not infer invalid if we have a single round of replies with minKnown 
not decided and maxKnown erased - in this case store the knowledge for next 
request.
     - Fix SyncPoint topology selection
     - Fix CheckStatusOkFull.with(InvalidIf)
     - Fix NotifyWaitingOn
     - ExecuteTxn should only contact latest topology for follow-up requests
     - DurableBefore.min should not go backwards on new epoch topology, journal 
replay was not correctly handling PreApplied, partialTxn can be null if not 
owned
     - Fix notify pre-bootstrap that arrives post-bootstrap
     - Avoid GC race condition on Propagate where we can incorrectly infer a 
shard is stale
     - Ensure redundantBefore on previously-owned range does not imply 
redundant before for overlapping queries on still-owned range
     - Ensure we don't mark stale unless all of the quorum we contacted had 
erased, else we may have raced with the agreement and erase
     - Fix Invalidate when no route found for FetchData does not report to all 
requested local epochs
     - Fix WAS_OWNED_RETIRED without durableBefore at Universal can lead to 
assertions with RX that we permit to execute but that have not yet
     - Fix initialiseWaitingOn can in some cases transitively notify the 
command we're updating via maybeCleanup of dependencies, but the command isn't 
yet updated so isn't ready
     - Fix encountering a command that is pre-bootstrap, and for which we have 
locally 'applied' a supserseding RX, so that we do not know its outcome locally 
(so we do not cleanup the command), but also it must have been decided - and we 
should
 not respond with future dependencies.
     - Epoch failures on CoordinatePreAccept should trigger the 
CoordinatePreAccept failure handler
     - Use the shard bound rather than GC bound for fallback dependency
     - LatestDeps should be sliced to actual route, so as not to use both 
PreAccepted AND Stable deps as though Stable
     - Fix various callback issues with node.withEpoch and 
Recover/Propose.isDone
     - RecoverWithRoute can encounter a partially truncated transaction where 
the Deps for one shard are not committed. Must fetch LatestDeps.
     - Tighten LatestDeps semantics for Recover
     - CommandsForKey: do not restore pruned as APPLIED
     - Ensure prune points execute in the epoch in which they are declared
     - must merge all fast path votes including those from earlier epochs that 
may have witnessed a later transaction
     - Recoveries that know the transaction is committed a priori should skip 
the Accept phase
     - Maintain GC behaviour for redundant commands that are pre-bootstrap
     - don't apply ERASE to CommandsForKey to avoid breaking pruning
     - Introduce clearBefore to ProgressLog to more consistently handle 
cleaning up redundant transactions (and avoid triggering burn test invariants)
     - don't replay journal of a bootstrapping node in burn test
     - Recover, Accept or Commit reply from epoch that has been retired should 
be treated as Success rather than Redundant
     - Distinguish completely REDUNDANT+PRE_BOOTSTRAP from partially GC_BEFORE 
and REDUNDANT+PRE_BOOTSTRAP - latter can make stronger inferences based on the 
GC_BEFORE intersection (could perhaps be treated as simply GC_BEFORE)
     - RX must register historical transactions with CFK
     - CommandStore.bootstrapper must wait for coordinate sync via same 
mechanism as sync()
     - Don't start topology change for shard where all replicas are already 
bootstrapping
     - Reify executes et al in StoreParticipants
     - LocalListeners txn listener reentry may erase the entry entirely
     - use registerAt in AbstractRequest for expirations, use correct time for 
expiresAt in ListAgent
     - use txnId.epoch() for pruning, as must be before both txnId and 
executeAt of prune point for coordinating dependencies
     - compute accurate KnownMap when affected by bootstrap or staleness
     - upgradeTruncated should calculate Definition and Deps separately
     - Invalidate should not sort before Erased when calculating max reply or 
max knowledge reply
     - avoid another infinite loop at end of burn test
     - avoid another epoch loading edge case
     - pass through low/high epochs to ensure we propagate information to all 
waiting command stores
     - RX must adopt a non-pruned dependency that has a higher TxnId (if is 
itself behind prune point)
     - rejects should also be calculated on COMMITTED started before
     - remove Apply Factory wrapper for RX, redundant now we have 
CoordinationAdapters (and has faulty epoch logic)
     - for RX ensure we return maximum  writes for each epoch we intersect 
(same effectively as pruning logic)
     - rework updateUnmanaged to improve clarity
     - BeginRecovery constructor of LatestDeps should use touches() not owns() 
for compute localDeps
     - BeginRecovery superseding calculation was incorrectly treating 
startedBefore Committed and Accepted the same, when the point at which a dep 
should be known differs
     - Refactor Command visiting, porting C* integration to accord-core
     - RelationMultiMap Builder should resize keys and keyLimits independently
     - CommandsForKey Serialization moved to accord-core
     - losing ownership of range should trigger re-registration of unmanaged 
waiting on commit of a no-longer owned txn



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (CASSANDRA-20172) Accord: Fix various bugs, improve burn test reliability

Reply via email to