David Capwell created CASSANDRA-19260: -----------------------------------------
Summary: org.apache.cassandra.tcm.ClusterMetadataService#commit does not catch up when rejected Key: CASSANDRA-19260 URL: https://issues.apache.org/jira/browse/CASSANDRA-19260 Project: Cassandra Issue Type: Bug Components: Transactional Cluster Metadata Reporter: David Capwell This was found in the cep-15-accord branch (CASSANDRA-18804). The test that found this was a simple benchmark test. 1) deploy a 6 node cluster 2) create a table 3) in parallel launch many accord transactions When accord gets a transaction it needs to make sure the table is “managed” by accord which uses TCM for this bookkeeping, this is just a List<TableId> in ClusterMetadata. We found that we detect that the table isn’t managed so we try to add it, we get a reject and the TCM epoch has not moved forward! Debugging this it looks like org.apache.cassandra.tcm.RemoteProcessor#commit is the root cause as it only seems to try to catch up if there is a messaging error and not a TCM rejection! Given that the caller to TCM is not able to find the epoch to “wait” on I feel that this is a TCM issue as TCM normally tries to make sure success/rejects are blocking, but in this one case it appears not to be so -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org