David Capwell created CASSANDRA-19260:
-----------------------------------------

             Summary: org.apache.cassandra.tcm.ClusterMetadataService#commit 
does not catch up when rejected
                 Key: CASSANDRA-19260
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19260
             Project: Cassandra
          Issue Type: Bug
          Components: Transactional Cluster Metadata
            Reporter: David Capwell


This was found in the cep-15-accord branch (CASSANDRA-18804).  The test that 
found this was a simple benchmark test.

1) deploy a 6 node cluster
2) create a table
3) in parallel launch many accord transactions

When accord gets a transaction it needs to make sure the table is “managed” by 
accord which uses TCM for this bookkeeping, this is just a List<TableId> in 
ClusterMetadata.  We found that we detect that the table isn’t managed so we 
try to add it, we get a reject and the TCM epoch has not moved forward!

Debugging this it looks like org.apache.cassandra.tcm.RemoteProcessor#commit is 
the root cause as it only seems to try to catch up if there is a messaging 
error and not a TCM rejection!  Given that the caller to TCM is not able to 
find the epoch to “wait” on I feel that this is a TCM issue as TCM normally 
tries to make sure success/rejects are blocking, but in this one case it 
appears not to be so



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to