[ 
https://issues.apache.org/jira/browse/CASSANDRA-18471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716029#comment-17716029
 ] 

David Capwell commented on CASSANDRA-18471:
-------------------------------------------

Spoke with [~benedict] in slack (see 
https://the-asf.slack.com/archives/C0459N9R5C6/p1682368538520669) and looks 
like the test is not correct, and the issue is the following message ordering

{code}
{from:4, to:2, id:107, body:BeginInvalidate{txnId:[1,4602982,3,2], 
ballot:[1,4602976,0,4]}}
{from:2, to:4, replyTo:107, body:InvalidatePromised{NotWitnessed,#13085}}
{from:2, to:6, id:162, body:PreAccept{txnId:[1,4602982,3,2], 
txn:{read:[Range(#34185, #37124]]}, scope:[Range(#34185, #37124]]}}
{from:6, to:2, replyTo:162, body:PreAcceptOk{txnId:[1,4602982,3,2], 
witnessedAt:[1,4602989,0,6], deps:[Range(#31944, #32765], Range(#32765, 
#34401], Range(#34401, #36858], Range(#36858, #39318]]:{9:[[1,4602974,2,3]]}, 
{}}}
{code}

We have a NotWitnessed command that we update the promised for, but looks like 
we override "promised" to be Ballot.ZERO, ignoring this all together.  We 
should not override else we loose this state.

> CEP-15 Accord: BurnTest fails with Received replies from a node that must 
> have known the route, but that did not include it
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18471
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18471
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Accord
>            Reporter: David Capwell
>            Priority: Normal
>             Fix For: 5.x
>
>
> While working on CASSANDRA-18451 I hit the following failure
> {code}
> Failed on seed -5929214838499924343
> accord.burn.SimulationException: Failed on seed -5929214838499924343
> Caused by: java.lang.AssertionError: Unexpected exception encountered
>       at 
> accord.impl.basic.PropagatingPendingQueue.poll(PropagatingPendingQueue.java:73)
>       at accord.impl.basic.Cluster.processPending(Cluster.java:179)
>       at accord.impl.basic.Cluster.run(Cluster.java:296)
>       at accord.burn.BurnTest.burn(BurnTest.java:309)
>       at accord.burn.BurnTest.run(BurnTest.java:386)
>       at accord.burn.BurnTest.testOne(BurnTest.java:372)
>       Suppressed: java.lang.IllegalStateException: Received replies from a 
> node that must have known the route, but that did not include it
>               at accord.coordinate.Invalidate.invalidate(Invalidate.java:204)
>               at accord.coordinate.Invalidate.handle(Invalidate.java:131)
>               at accord.coordinate.Invalidate.onSuccess(Invalidate.java:105)
>               at accord.coordinate.Invalidate.onSuccess(Invalidate.java:51)
>               at 
> accord.impl.basic.Cluster.lambda$processNext$1(Cluster.java:209)
>               at accord.impl.basic.Cluster.now(Cluster.java:260)
>               at accord.impl.basic.Cluster.processNext(Cluster.java:206)
>               at accord.impl.basic.Cluster.processPending(Cluster.java:183)
> {code}
> In a debugger was able to figure out the state and create a unit test to hit 
> the same situation
> {code}
> class InvalidateTest
> {
>     @Test
>     void test() throws ExecutionException
>     {
>         try (MockCluster cluster = 
> MockCluster.builder().replication(2).nodes(2).build())
>         {
>             Node n1 = cluster.get(1);
>             Node n2 = cluster.get(2);
>             RoutingKey n1RoutingKey = 
> n1.topology().current().get(0).range.end();
>             IntKey.Raw n1key = IntKey.key(((IntKey.Routing) 
> n1RoutingKey).key);
>             RoutingKey n2RoutingKey = 
> n1.topology().current().get(1).range.end();
>             IntKey.Raw n2key = IntKey.key(((IntKey.Routing) 
> n2RoutingKey).key);
>             Keys keys = Keys.of(n1key, n2key);
>             Node coordinator = n1;
>             TxnId txnId = coordinator.nextTxnId(Txn.Kind.Read, 
> Routable.Domain.Key);
>             Txn txn = readOnly(keys);
>             
> AsyncChains.getUninterruptibly(n2.commandStores().unsafeForKey(n2key).execute(PreLoadContext.contextFor(txnId,
>  keys), store -> {
>                 Ranges ranges = store.ranges().currentRanges();
>                 PartialTxn partial = txn.slice(ranges, true);
>                 FullKeyRoute route = keys.toRoute(n2RoutingKey);
> //                RoutingKey progressKey = n2RoutingKey.toUnseekable(); // if 
> this is non-null this passes
>                 RoutingKey progressKey = null;
>                 CheckedCommands.preaccept(store, txnId, partial, route, 
> progressKey);
>                 CheckedCommands.accept(store, txnId, Ballot.ZERO, 
> route.slice(ranges), partial.keys().slice(ranges), progressKey, txnId, 
> PartialDeps.builder(ranges).build());
>             }));
>             AsyncChains.getUninterruptibly(new AsyncChains.Head<Outcome>() {
>                 @Override
>                 protected void start(BiConsumer<? super Outcome, Throwable> 
> callback) {
>                     Invalidate.invalidate(coordinator, txnId, 
> keys.toUnseekables(), callback);
>                 }
>             });
>         }
>     }
>     private static Txn readOnly(Seekables<?, ?> keys)
>     {
>         Read read = MockStore.read(keys);
>         Query query = Mockito.mock(Query.class);
>         return new Txn.InMemory(keys, read, query);
>     }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to