[
https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053647#comment-17053647
]
Kevin Gallardo edited comment on CASSANDRA-15543 at 3/6/20, 7:45 PM:
---------------------------------------------------------------------
Also, I suggested we wait for all responses because this is what the test
expects, the test waits for the {{ReadFailureException}} to contain *both*
failures without accounting for possible race conditions of the
{{ReadCallback}} returning before all responses have come back.
EDIT: I confused the number of received and the number of failures, I think
what is below is incorrect.
-Also noticed that in the current context, the number of {{failure}} and the
list of failures can be inconsistent with each other, because we create the
exception with-
{code:java}
new ReadFailureException(replicaPlan().consistencyLevel(), received, blockFor,
resolver.isDataPresent(), failureReasonByEndpoint){code}
-Concurrently, {{received}} can have been incremented but not the map of
{{failureReasonByEndpoint}} and we could end up with:-
{code:java}
Operation failed - received 1 responses and 2 failures: INCOMPATIBLE_SCHEMA
from 127.0.0.3:7012{code}
-"2" failures but only 1 error message. (number of {{received}} is irrelevant)-
-Conversely, since the reference to the map is passed to
{{ReadFailureException}} but {{received}} is passed by value, we can end up in
the following scenario:-
- -1 error triggers awaitResults() to unblock-
- -call to {{new ReadFailureException(received, failureReasonByEndpoint)}}-
- -value of 1 for {{received}} copied when calling {{ReadFailureException}}
constructor but only reference to {{failureReasonByEndpoint}} given-
- -concurrently second failure calls to {{onFailure()}} which updates
{{failures}} and the map-
- -{{ReadFailureException}} has the {{1}} value for {{failures}} but the map
has been updated to have 2 failures and constructs the error message-
-We end up with the error message:-
{code:java}
Operation failed - received 1 responses and 1 failures: INCOMPATIBLE_SCHEMA
from 127.0.0.3:7012, INCOMPATIBLE_SCHEMA from 127.0.0.2:7012{code}
-"1" failure but 2 error messages. (number of {{received}} is irrelevant)-
-One way to solve these issues would be to pass a copy of the
{{failureReasonByEndpoint}} to the constructor of {{ReadFailureException}}.-
-If {{await()}} waited until all messages came back or timeout the 2 issues
wouldn't be problematic I think.-
was (Author: newkek):
Also, I suggested we wait for all responses because this is what the test
expects, the test waits for the {{ReadFailureException}} to contain *both*
failures without accounting for possible race conditions of the
{{ReadCallback}} returning before all responses have come back.
Also noticed that in the current context, the number of {{failure}} and the
list of failures can be inconsistent with each other, because we create the
exception with {code}new ReadFailureException(replicaPlan().consistencyLevel(),
received, blockFor, resolver.isDataPresent(), failureReasonByEndpoint){code}.
Concurrently, {{received}} can have been incremented but not the map of
{{failureReasonByEndpoint}} and we could end up with:
{code}Operation failed - received 1 responses and 2 failures:
INCOMPATIBLE_SCHEMA from 127.0.0.3:7012{code}
"2" failures but only 1 error message. (number of {{received}} is irrelevant)
Conversely, since the reference to the map is passed to
{{ReadFailureException}} but {{received}} is passed by value, we can end up in
the following scenario:
- 1 error triggers awaitResults() to unblock
- call to {{new ReadFailureException(received, failureReasonByEndpoint)}}
- value of 1 for {{received}} copied when calling {{ReadFailureException}}
constructor but only reference to {{failureReasonByEndpoint}} given
- concurrently second failure calls to {{onFailure()}} which updates
{{failures}} and the map
- {{ReadFailureException}} has the {{1}} value for {{failures}} but the map has
been updated to have 2 failures and constructs the error message
We end up with the error message:
{code}Operation failed - received 1 responses and 1 failures:
INCOMPATIBLE_SCHEMA from 127.0.0.3:7012, INCOMPATIBLE_SCHEMA from
127.0.0.2:7012{code}
"1" failure but 2 error messages. (number of {{received}} is irrelevant)
One way to solve these issues would be to pass a copy of the
{{failureReasonByEndpoint}} to the constructor of {{ReadFailureException}}.
If {{await()}} waited until all messages came back or timeout the 2 issues
wouldn't be problematic I think.
> flaky test
> org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
> -----------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15543
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15543
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest
> Reporter: David Capwell
> Assignee: Kevin Gallardo
> Priority: Normal
> Fix For: 4.0-alpha
>
>
> This fails infrequently, last seen failure was on java 8
> {code}
> junit.framework.AssertionFailedError
> at
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]