[
https://issues.apache.org/jira/browse/CASSANDRA-19260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Petrov updated CASSANDRA-19260:
------------------------------------
Resolution: Fixed
Status: Resolved (was: Open)
> org.apache.cassandra.tcm.ClusterMetadataService#commit does not catch up when
> rejected
> --------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19260
> Project: Cassandra
> Issue Type: Bug
> Components: Transactional Cluster Metadata
> Reporter: David Capwell
> Assignee: Alex Petrov
> Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, ci_summary.json
>
>
> This was found in the cep-15-accord branch (CASSANDRA-18804). The test that
> found this was a simple benchmark test.
> 1) deploy a 6 node cluster
> 2) create a table
> 3) in parallel launch many accord transactions
> When accord gets a transaction it needs to make sure the table is “managed”
> by accord which uses TCM for this bookkeeping, this is just a List<TableId>
> in ClusterMetadata. We found that we detect that the table isn’t managed so
> we try to add it, we get a reject and the TCM epoch has not moved forward!
> Debugging this it looks like org.apache.cassandra.tcm.RemoteProcessor#commit
> is the root cause as it only seems to try to catch up if there is a messaging
> error and not a TCM rejection! Given that the caller to TCM is not able to
> find the epoch to “wait” on I feel that this is a TCM issue as TCM normally
> tries to make sure success/rejects are blocking, but in this one case it
> appears not to be so
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]