[
https://issues.apache.org/jira/browse/CASSANDRA-17005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599515#comment-17599515
]
Ekaterina Dimitrova commented on CASSANDRA-17005:
-------------------------------------------------
I also saw it today on 4.1
Maybe worth it to be checked at some point as it seems those timeouts
consistently pop up here and there only for this test?
https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/1880/workflows/85df925c-da13-430e-99d1-1203f6440c41/jobs/14862/tests#failed-test-0
> Fix test
> dtest.repair_tests.incremental_repair_test.TestIncRepair.test_multiple_repair
> --------------------------------------------------------------------------------------
>
> Key: CASSANDRA-17005
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17005
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/python
> Reporter: Ekaterina Dimitrova
> Assignee: Berenguer Blasi
> Priority: Normal
> Fix For: 4.0.x, 4.1
>
>
> [dtest.repair_tests.incremental_repair_test.TestIncRepair.test_multiple_repair|https://jenkins-cm4.apache.org/job/Cassandra-devbranch/1143/testReport/junit/dtest.repair_tests.incremental_repair_test/TestIncRepair/test_multiple_repair/]
> is flaky:
> h3.
> {code:java}
> Error Message
> cassandra.OperationTimedOut: errors={'127.0.0.2:9042': 'Client request
> timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.2:9042
> Stacktrace
> self = <repair_tests.incremental_repair_test.TestIncRepair object at
> 0x7ff52f4f4fd0> def test_multiple_repair(self): """ * Launch a three node
> cluster * Create a keyspace with RF 3 and a table * Insert 49 rows * Stop
> node3 * Insert 50 more rows * Restart node3 * Issue an incremental repair on
> node3 * Stop node2 * Insert a final50 rows * Restart node2 * Issue an
> incremental repair on node2 * Replace node3 with a new node * Verify data
> integrity # TODO: Several more verifications of data need to be interspersed
> throughout the test. The final assertion is insufficient. @jira_ticket
> CASSANDRA-10644 """ cluster = self.cluster cluster.populate(3).start() node1,
> node2, node3 = cluster.nodelist() session =
> self.patient_cql_connection(node1) create_ks(session, 'ks', 3) if
> cluster.version() < '4.0': create_cf(session, 'cf', read_repair=0.0,
> columns={'c1': 'text', 'c2': 'text'}) else: create_cf(session, 'cf',
> columns={'c1': 'text', 'c2': 'text'}) logger.debug("insert data")
> insert_c1c2(session, keys=list(range(1, 50)),
> consistency=ConsistencyLevel.ALL) node1.flush() logger.debug("bringing down
> node 3") node3.flush() node3.stop(gently=False) logger.debug("inserting
> additional data into node 1 and 2") insert_c1c2(session, keys=list(range(50,
> 100)), consistency=ConsistencyLevel.TWO) node1.flush() node2.flush()
> logger.debug("restarting and repairing node 3")
> node3.start(wait_for_binary_proto=True) if cluster.version() >= "2.2":
> node3.repair() else: node3.nodetool("repair -par -inc") # wait stream
> handlers to be closed on windows # after session is finished (See
> CASSANDRA-10644) if is_win: time.sleep(2) logger.debug("stopping node 2")
> node2.stop(gently=False) logger.debug("inserting data in nodes 1 and 3")
> insert_c1c2(session, keys=list(range(100, 150)),
> consistency=ConsistencyLevel.TWO) node1.flush() node3.flush()
> logger.debug("start and repair node 2")
> node2.start(wait_for_binary_proto=True) if cluster.version() >= "2.2":
> node2.repair() else: node2.nodetool("repair -par -inc") logger.debug("replace
> node and check data integrity") node3.stop(gently=False) node5 =
> Node('node5', cluster, True, ('127.0.0.5', 9160), ('127.0.0.5', 7000),
> '7500', '0', None, ('127.0.0.5', 9042)) cluster.add(node5, False,
> data_center="dc1") node5.start(replace_address='127.0.0.3') >
> assert_one(session, "SELECT COUNT(*) FROM ks.cf LIMIT 200", [149])
> repair_tests/incremental_repair_test.py:300: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tools/assertions.py:130: in
> assert_one res = session.execute(simple_query)
> ../venv/src/cassandra-driver/cassandra/cluster.py:2618: in execute return
> self.execute_async(query, parameters, trace, custom_payload, timeout,
> execution_profile, paging_state, host, execute_as).result() _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self =
> <ResponseFuture: query='<SimpleStatement query="SELECT COUNT(*) FROM ks.cf
> LIMIT 200", consistency=Not Set>' request_i...9042': 'Client request timeout.
> See Session.execute[_async](timeout)'}, last_host=127.0.0.2:9042
> coordinator_host=None> def result(self): """ Return the final result or raise
> an Exception if errors were encountered. If the final result or error has not
> been set yet, this method will block until it is set, or the timeout set for
> the request expires. Timeout is specified in the Session request execution
> functions. If the timeout is exceeded, an :exc:`cassandra.OperationTimedOut`
> will be raised. This is a client-side timeout. For more information about
> server-side coordinator timeouts, see :class:`.policies.RetryPolicy`. Example
> usage:: >>> future = session.execute_async("SELECT * FROM mycf") >>> # do
> other stuff... >>> try: ... rows = future.result() ... for row in rows: ...
> ... # process results ... except Exception: ... log.exception("Operation
> failed:") """ self._event.wait() if self._final_result is not _NOT_SET:
> return ResultSet(self, self._final_result) else: > raise
> self._final_exception E cassandra.OperationTimedOut:
> errors={'127.0.0.2:9042': 'Client request timeout. See
> Session.execute[_async](timeout)'}, last_host=127.0.0.2:9042
> ../venv/src/cassandra-driver/cassandra/cluster.py:4894: OperationTimedOut
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]