[
https://issues.apache.org/jira/browse/CASSANDRA-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109647#comment-17109647
]
Ekaterina Dimitrova edited comment on CASSANDRA-15685 at 5/17/20, 9:19 PM:
---------------------------------------------------------------------------
After a couple of hundred more runs of this test (my gut feeling told me that I
miss something), it was confirmed that the lossy notifications are not the
primary issue with this test.
The thing is that even if we catch the notifications for success/error and the
flags "success" and "wasConsistent" are properly set, still the PreviewRepair
shows that the Incremental Repair is still running.
{code:java}
[junit-timeout] java.lang.RuntimeException: Repair session
82ff3420-9737-11ea-b32d-7fa12d874715 for range [(-1,9223372036854775805],
(9223372036854775805,-1]] failed with error An incremental repair with session
id 82eff1e0-9737-11ea-b32d-7fa12d874715 finished during this preview repair
runtime
{code}
Turns out getting the notification doesn't always mean that the rest of the
nodes are already informed about the completion. I can easily increase the time
before preview repair starts.
But we were considering with [~dcapwell] to open a case as there might be other
parts of the code or tools relying only on the notifications for completion.
Worth to be checked.
Also, I am gonna check tomorrow in detail how we can improve this test not to
rely on timing but probably some metadata.
was (Author: e.dimitrova):
After a couple of hundred more runs of this test, based on my gut feeling that
we miss something, it was confirmed that the lossy notifications are not the
primary issue with this test.
The thing is that even if we catch the notifications for success/error and the
flags "success" and "wasConsistent" are properly set, still the PreviewRepair
shows that the Incremental Repair is still running.
{code:java}
[junit-timeout] java.lang.RuntimeException: Repair session
82ff3420-9737-11ea-b32d-7fa12d874715 for range [(-1,9223372036854775805],
(9223372036854775805,-1]] failed with error An incremental repair with session
id 82eff1e0-9737-11ea-b32d-7fa12d874715 finished during this preview repair
runtime
{code}
Turns out getting the notification doesn't always mean that the rest of the
nodes are already informed about the completion. I can easily increase the time
before preview repair starts.
But we were considering with [~dcapwell] to open a case as there might be other
parts of the code or tools relying only on the notifications for completion.
Worth to be checked.
Also, I am gonna check tomorrow in detail how we can improve this test not to
rely on timing but probably some metadata.
> flaky testWithMismatchingPending -
> org.apache.cassandra.distributed.test.PreviewRepairTest
> ------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15685
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest
> Reporter: Kevin Gallardo
> Assignee: Ekaterina Dimitrova
> Priority: Normal
> Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: log-CASSANDRA-15685.txt, output
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Observed in:
> https://app.circleci.com/pipelines/github/newkek/cassandra/34/workflows/1c6b157d-13c3-48a9-85fb-9fe8c153256b/jobs/191/tests
> Failure:
> {noformat}
> testWithMismatchingPending -
> org.apache.cassandra.distributed.test.PreviewRepairTest
> junit.framework.AssertionFailedError
> at
> org.apache.cassandra.distributed.test.PreviewRepairTest.testWithMismatchingPending(PreviewRepairTest.java:97)
> {noformat}
> [Circle
> CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FCASSANDRA-15685]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]