[
https://issues.apache.org/jira/browse/CASSANDRA-18347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Tunnicliffe updated CASSANDRA-18347:
----------------------------------------
Resolution: Not A Problem
Status: Resolved (was: Open)
> CEP-21: Startup failures in Python dtests around TCM_REPLAY_REQ
> ---------------------------------------------------------------
>
> Key: CASSANDRA-18347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18347
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Membership, Cluster/Schema
> Reporter: Caleb Rackliffe
> Priority: Normal
> Fix For: NA
>
>
> There are currently widespread, locally reproducible failures in the Python
> dtests against the {{cep-21-tcm}} branch. For example...
>
> {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra
> topology_test.py::TestTopology::test_decommissioned_node_cant_rejoin{noformat}
> {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra
> materialized_views_test.py::TestMaterializedViews::test_query_new_column{noformat}
> {noformat}pytest --cassandra-dir=/Users/maedhroz/Forks/cassandra
> read_repair_test.py::TestSpeculativeReadRepair::test_normal_read_repair{noformat}
> https://app.circleci.com/pipelines/github/maedhroz/cassandra/701/workflows/44a5c7e0-0de0-4839-bbd0-80771fe23843/jobs/7251
> https://app.circleci.com/pipelines/github/beobal/cassandra/406/workflows/00cdb02e-4b3e-477a-b997-403121172249/jobs/4204/tests
> The death spiral in the node startup logs starts like this…
> {noformat}
> WARN [Messaging-EventLoop-3-1] 2023-03-17 11:55:34,037 NoSpamLogger.java:108
> - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping
> message of type TCM_REPLAY_REQ whose timeout expired before reaching the
> network
> ERROR [InternalResponseStage:3] 2023-03-17 11:55:34,038
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000],
> checkLive=false}
> INFO [Messaging-EventLoop-3-12] 2023-03-17 11:55:34,099
> InboundConnectionInitiator.java:567 -
> /127.0.0.2:7000(/127.0.0.2:49763)->/127.0.0.2:7000-SMALL_MESSAGES-1b9301b6
> messaging connection established, version = 13, framing = CRC, encryption =
> unencrypted
> INFO [Messaging-EventLoop-3-9] 2023-03-17 11:55:34,099
> OutboundConnection.java:1164 -
> /127.0.0.2:7000(/127.0.0.2:49763)->/127.0.0.2:7000-SMALL_MESSAGES-a9302b2e
> successfully connected, version = 13, framing = CRC, encryption = unencrypted
> WARN [InternalMetadataStage:5] 2023-03-17 11:55:34,100 NoSpamLogger.java:108
> - Not currently a member of the CMS
> INFO [Messaging-EventLoop-3-13] 2023-03-17 11:55:34,102
> InboundConnectionInitiator.java:567 -
> /127.0.0.2:7000(/127.0.0.2:49764)->/127.0.0.2:7000-URGENT_MESSAGES-f887f6fa
> messaging connection established, version = 13, framing = CRC, encryption =
> unencrypted
> INFO [Messaging-EventLoop-3-11] 2023-03-17 11:55:34,102
> OutboundConnection.java:1164 -
> /127.0.0.2:7000(/127.0.0.2:49764)->/127.0.0.2:7000-URGENT_MESSAGES-5cd0c637
> successfully connected, version = 13, framing = CRC, encryption = unencrypted
> ERROR [InternalResponseStage:4] 2023-03-17 11:55:49,237
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000,
> /127.0.0.2:7000, /
> 127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
> WARN [InternalMetadataStage:8] 2023-03-17 11:55:49,394 NoSpamLogger.java:108
> - Not currently a member of the CMS
> WARN [Messaging-EventLoop-3-1] 2023-03-17 11:56:04,636 NoSpamLogger.java:108
> - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping
> message of type TCM_REPLAY_REQ whose timeout expired before reaching the
> network
> ERROR [InternalResponseStage:5] 2023-03-17 11:56:04,637
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000,
> /127.0.0.1:7000, /
> 127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000,
> /127.0.0.1:7000], checkLive=false}
> WARN [InternalMetadataStage:11] 2023-03-17 11:56:04,892
> NoSpamLogger.java:108 - Not currently a member of the CMS
> ...
> ERROR [InternalResponseStage:6] 2023-03-17 11:56:20,335
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000],
> checkLive=false}
> WARN [InternalMetadataStage:14] 2023-03-17 11:56:20,391
> NoSpamLogger.java:108 - Not currently a member of the CMS
> ERROR [InternalResponseStage:7] 2023-03-17 11:56:21,750
> RemoteProcessor.java:164 - Got error from /127.0.0.3:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.1:7000, /127.0.0.2:7000,
> /127.0.0.1:7000, /
> 127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000,
> /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.3:7000],
> checkLive=false}
> WARN [Messaging-EventLoop-3-1] 2023-03-17 11:56:35,535 NoSpamLogger.java:108
> - /127.0.0.2:7000->/127.0.0.1:7000-SMALL_MESSAGES-[no-channel] dropping
> message of type TCM_REPLAY_REQ whose timeout expired before reaching the
> network
> ERROR [InternalResponseStage:8] 2023-03-17 11:56:35,537
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000,
> /127.0.0.2:7000, /
> 127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
> WARN [InternalMetadataStage:17] 2023-03-17 11:56:35,693
> NoSpamLogger.java:108 - Not currently a member of the CMS
> ERROR [InternalResponseStage:9] 2023-03-17 11:56:37,135
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.1:7000,
> /127.0.0.2:7000, /
> 127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000,
> /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.3:7000, /127.0.0.1:7000],
> checkLive=false}
> WARN [InternalMetadataStage:20] 2023-03-17 11:56:37,540
> NoSpamLogger.java:108 - Not currently a member of the CMS
> ERROR [InternalResponseStage:10] 2023-03-17 11:56:50,935
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000,
> /127.0.0.1:7000,
> /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000,
> /127.0.0.1:7000], checkLive=false}
> WARN [InternalMetadataStage:23] 2023-03-17 11:56:51,191
> NoSpamLogger.java:108 - Not currently a member of the CMS
> {noformat}
> ...and ends here:
> {noformat}
> ERROR [InternalResponseStage:11] 2023-03-17 11:56:53,036
> RemoteProcessor.java:164 - Got error from /127.0.0.1:7000: TIMEOUT when
> sending TCM_REPLAY_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000,
> /127.0.0.1:7000,
> /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000,
> /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000,
> /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000], checkLive=false}
> Exception (java.lang.IllegalStateException) encountered during startup: Could
> not succeed sending TCM_REPLAY_REQ to
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000,
> /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000
> , /127.0.0.3:7000, /127.0.0.3:7000, /127.0.0.1:7000, /127.0.0.2:7000,
> /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.3:7000, /127.0.0.1:7000],
> checkLive=false} after 10 tries
> ERROR [main] 2023-03-17 11:56:53,546 CassandraDaemon.java:929 - Exception
> encountered during startup
> java.lang.IllegalStateException: Could not succeed sending TCM_REPLAY_REQ to
> CandidateIterator{candidates=[/127.0.0.2:7000, /127.0.0.3:7000,
> /127.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000,
> /127.0.0.3:7000, /127.0.0.3:7000, /12
> 7.0.0.1:7000, /127.0.0.2:7000, /127.0.0.1:7000, /127.0.0.2:7000,
> /127.0.0.3:7000, /127.0.0.1:7000], checkLive=false} after 10 tries
> at
> org.apache.cassandra.tcm.RemoteProcessor.sendWithCallback(RemoteProcessor.java:181)
> at
> org.apache.cassandra.tcm.RemoteProcessor.replayAndWait(RemoteProcessor.java:118)
> at
> org.apache.cassandra.tcm.ClusterMetadataService$SwitchableProcessor.replayAndWait(ClusterMetadataService.java:577)
> at
> org.apache.cassandra.tcm.Startup.initializeForDiscovery(Startup.java:149)
> at org.apache.cassandra.tcm.Startup.initialize(Startup.java:84)
> at org.apache.cassandra.tcm.Startup.initialize(Startup.java:59)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:267)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:777)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:907)
> ...
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]