[ 
https://issues.apache.org/jira/browse/CASSANDRA-15993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205065#comment-17205065
 ] 

Adam Holmberg commented on CASSANDRA-15993:
-------------------------------------------

There are a couple things going on here. 

The main source of timeouts was running all the create views at once:
The view builders run asynchronously in the background. As each new CREATE VIEW 
is issued, the schema migration causes any in-progress build to stop and 
restart anew. This causes a snowball of stopping and restarting builds that 
then makes one of the later DDL statements timeout (they would complete with 
longer timeouts).

Changing the dtest to simply wait for each build synchronously removes that 
contention. Although they are built serially, due to avoiding contention there 
is also no increase in runtime. My test setup that previously failed 1/12 times 
runs hundreds of times with just this change, and no increased timeout. 
However, there's another mechanism here that makes me think we should raise the 
timeouts as well:

It just so happens that [this 
task|https://github.com/aholmberg/cassandra/blob/c6ef4762eeee78ec783b77faa367e82d9b1ffabc/src/java/org/apache/cassandra/service/CassandraDaemon.java#L406-L414]
 is scheduled to run at roughly the same as the 
[drop_keyspace|https://github.com/aholmberg/cassandra-dtest/blob/efc64a670955eaf91533911c7cbbb792fd5add19/materialized_views_test.py#L236]
 DML is usually running, making another chance for contention as the build 
tasks are stopped for schema migration. I haven't been able to reproduce, but 
my theory is that's what's causing the [occasional 
timeout|https://ci-cassandra.apache.org/job/Cassandra-3.11/lastCompletedBuild/testReport/dtest-novnode.materialized_views_test/TestMaterializedViews/test_view_metadata_cleanup/]
 observed in 3.11. Rather than try to find timing around that, my suggestion is 
simply to raise the request timeout for any DDL that happens while views are 
present in the keyspace.

[patch|https://github.com/aholmberg/cassandra-dtest/commit/efc64a670955eaf91533911c7cbbb792fd5add19]
[ci|https://app.circleci.com/pipelines/github/aholmberg/cassandra?branch=CASSANDRA-15993]

> Fix flaky python dtest test_view_metadata_cleanup - 
> materialized_views_test.TestMaterializedViews
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15993
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: David Capwell
>            Assignee: Adam Holmberg
>            Priority: Normal
>             Fix For: 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/355/workflows/7b8df61d-706f-4094-a206-7cdc6b4e0451/jobs/1818
> {code}
> E   cassandra.OperationTimedOut: errors={'127.0.0.2': 'Client request 
> timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.2
> cassandra/cluster.py:4026: OperationTimedOut
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to