[
https://issues.apache.org/jira/browse/CASSANDRA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205065#comment-17205065
]
Adam Holmberg commented on CASSANDRA-15993:
-------------------------------------------
There are a couple things going on here.
The main source of timeouts was running all the create views at once:
The view builders run asynchronously in the background. As each new CREATE VIEW
is issued, the schema migration causes any in-progress build to stop and
restart anew. This causes a snowball of stopping and restarting builds that
then makes one of the later DDL statements timeout (they would complete with
longer timeouts).
Changing the dtest to simply wait for each build synchronously removes that
contention. Although they are built serially, due to avoiding contention there
is also no increase in runtime. My test setup that previously failed 1/12 times
runs hundreds of times with just this change, and no increased timeout.
However, there's another mechanism here that makes me think we should raise the
timeouts as well:
It just so happens that [this
task|https://github.com/aholmberg/cassandra/blob/c6ef4762eeee78ec783b77faa367e82d9b1ffabc/src/java/org/apache/cassandra/service/CassandraDaemon.java#L406-L414]
is scheduled to run at roughly the same as the
[drop_keyspace|https://github.com/aholmberg/cassandra-dtest/blob/efc64a670955eaf91533911c7cbbb792fd5add19/materialized_views_test.py#L236]
DML is usually running, making another chance for contention as the build
tasks are stopped for schema migration. I haven't been able to reproduce, but
my theory is that's what's causing the [occasional
timeout|https://ci-cassandra.apache.org/job/Cassandra-3.11/lastCompletedBuild/testReport/dtest-novnode.materialized_views_test/TestMaterializedViews/test_view_metadata_cleanup/]
observed in 3.11. Rather than try to find timing around that, my suggestion is
simply to raise the request timeout for any DDL that happens while views are
present in the keyspace.
[patch|https://github.com/aholmberg/cassandra-dtest/commit/efc64a670955eaf91533911c7cbbb792fd5add19]
[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15993]
> Fix flaky python dtest test_view_metadata_cleanup -
> materialized_views_test.TestMaterializedViews
> -------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15993
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/python
> Reporter: David Capwell
> Assignee: Adam Holmberg
> Priority: Normal
> Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> E cassandra.OperationTimedOut: errors={'127.0.0.2': 'Client request
> timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.2
> cassandra/cluster.py:4026: OperationTimedOut
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]