Benedict Elliott Smith created CASSANDRA-20172:
--------------------------------------------------
Summary: Accord: Fix various bugs, improve burn test reliability
Key: CASSANDRA-20172
URL: https://issues.apache.org/jira/browse/CASSANDRA-20172
Project: Apache Cassandra
Issue Type: Bug
Components: Accord
Reporter: Benedict Elliott Smith
- Fix notifying unmanaged after update redundant before/bootstrap
- Do not infer invalid if we have a single round of replies with minKnown
not decided and maxKnown erased - in this case store the knowledge for next
request.
- Fix SyncPoint topology selection
- Fix CheckStatusOkFull.with(InvalidIf)
- Fix NotifyWaitingOn
- ExecuteTxn should only contact latest topology for follow-up requests
- DurableBefore.min should not go backwards on new epoch topology, journal
replay was not correctly handling PreApplied, partialTxn can be null if not
owned
- Fix notify pre-bootstrap that arrives post-bootstrap
- Avoid GC race condition on Propagate where we can incorrectly infer a
shard is stale
- Ensure redundantBefore on previously-owned range does not imply
redundant before for overlapping queries on still-owned range
- Ensure we don't mark stale unless all of the quorum we contacted had
erased, else we may have raced with the agreement and erase
- Fix Invalidate when no route found for FetchData does not report to all
requested local epochs
- Fix WAS_OWNED_RETIRED without durableBefore at Universal can lead to
assertions with RX that we permit to execute but that have not yet
- Fix initialiseWaitingOn can in some cases transitively notify the
command we're updating via maybeCleanup of dependencies, but the command isn't
yet updated so isn't ready
- Fix encountering a command that is pre-bootstrap, and for which we have
locally 'applied' a supserseding RX, so that we do not know its outcome locally
(so we do not cleanup the command), but also it must have been decided - and we
should
not respond with future dependencies.
- Epoch failures on CoordinatePreAccept should trigger the
CoordinatePreAccept failure handler
- Use the shard bound rather than GC bound for fallback dependency
- LatestDeps should be sliced to actual route, so as not to use both
PreAccepted AND Stable deps as though Stable
- Fix various callback issues with node.withEpoch and
Recover/Propose.isDone
- RecoverWithRoute can encounter a partially truncated transaction where
the Deps for one shard are not committed. Must fetch LatestDeps.
- Tighten LatestDeps semantics for Recover
- CommandsForKey: do not restore pruned as APPLIED
- Ensure prune points execute in the epoch in which they are declared
- must merge all fast path votes including those from earlier epochs that
may have witnessed a later transaction
- Recoveries that know the transaction is committed a priori should skip
the Accept phase
- Maintain GC behaviour for redundant commands that are pre-bootstrap
- don't apply ERASE to CommandsForKey to avoid breaking pruning
- Introduce clearBefore to ProgressLog to more consistently handle
cleaning up redundant transactions (and avoid triggering burn test invariants)
- don't replay journal of a bootstrapping node in burn test
- Recover, Accept or Commit reply from epoch that has been retired should
be treated as Success rather than Redundant
- Distinguish completely REDUNDANT+PRE_BOOTSTRAP from partially GC_BEFORE
and REDUNDANT+PRE_BOOTSTRAP - latter can make stronger inferences based on the
GC_BEFORE intersection (could perhaps be treated as simply GC_BEFORE)
- RX must register historical transactions with CFK
- CommandStore.bootstrapper must wait for coordinate sync via same
mechanism as sync()
- Don't start topology change for shard where all replicas are already
bootstrapping
- Reify executes et al in StoreParticipants
- LocalListeners txn listener reentry may erase the entry entirely
- use registerAt in AbstractRequest for expirations, use correct time for
expiresAt in ListAgent
- use txnId.epoch() for pruning, as must be before both txnId and
executeAt of prune point for coordinating dependencies
- compute accurate KnownMap when affected by bootstrap or staleness
- upgradeTruncated should calculate Definition and Deps separately
- Invalidate should not sort before Erased when calculating max reply or
max knowledge reply
- avoid another infinite loop at end of burn test
- avoid another epoch loading edge case
- pass through low/high epochs to ensure we propagate information to all
waiting command stores
- RX must adopt a non-pruned dependency that has a higher TxnId (if is
itself behind prune point)
- rejects should also be calculated on COMMITTED started before
- remove Apply Factory wrapper for RX, redundant now we have
CoordinationAdapters (and has faulty epoch logic)
- for RX ensure we return maximum writes for each epoch we intersect
(same effectively as pruning logic)
- rework updateUnmanaged to improve clarity
- BeginRecovery constructor of LatestDeps should use touches() not owns()
for compute localDeps
- BeginRecovery superseding calculation was incorrectly treating
startedBefore Committed and Accepted the same, when the point at which a dep
should be known differs
- Refactor Command visiting, porting C* integration to accord-core
- RelationMultiMap Builder should resize keys and keyLimits independently
- CommandsForKey Serialization moved to accord-core
- losing ownership of range should trigger re-registration of unmanaged
waiting on commit of a no-longer owned txn
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]