[
https://issues.apache.org/jira/browse/CASSANDRA-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109647#comment-17109647
]
Ekaterina Dimitrova edited comment on CASSANDRA-15685 at 5/17/20, 9:21 PM:
---------------------------------------------------------------------------
After a couple of hundred more runs of this test (my gut feeling told me that I
miss something), it was confirmed that the lossy notifications are not the
primary issue with this test.
In some cases even if we catch the notifications for success/error and the
flags "success" and "wasConsistent" are properly set, still the PreviewRepair
shows that the Incremental Repair is still running.
{code:java}
[junit-timeout] java.lang.RuntimeException: Repair session
82ff3420-9737-11ea-b32d-7fa12d874715 for range [(-1,9223372036854775805],
(9223372036854775805,-1]] failed with error An incremental repair with session
id 82eff1e0-9737-11ea-b32d-7fa12d874715 finished during this preview repair
runtime
{code}
Turns out getting the notification doesn't always mean that the rest of the
nodes are already informed about the completion. I can easily increase the time
before preview repair starts.
But we were considering with [~dcapwell] to open a ticket as there might be
other parts of the code or tools relying only on the notifications for
completion. Worth to be checked.
Also, I am gonna check tomorrow in detail how we can improve this test not to
rely on timing but probably some metadata.
was (Author: e.dimitrova):
After a couple of hundred more runs of this test (my gut feeling told me that I
miss something), it was confirmed that the lossy notifications are not the
primary issue with this test.
In some cases even if we catch the notifications for success/error and the
flags "success" and "wasConsistent" are properly set, still the PreviewRepair
shows that the Incremental Repair is still running.
{code:java}
[junit-timeout] java.lang.RuntimeException: Repair session
82ff3420-9737-11ea-b32d-7fa12d874715 for range [(-1,9223372036854775805],
(9223372036854775805,-1]] failed with error An incremental repair with session
id 82eff1e0-9737-11ea-b32d-7fa12d874715 finished during this preview repair
runtime
{code}
Turns out getting the notification doesn't always mean that the rest of the
nodes are already informed about the completion. I can easily increase the time
before preview repair starts.
But we were considering with [~dcapwell] to open a case as there might be other
parts of the code or tools relying only on the notifications for completion.
Worth to be checked.
Also, I am gonna check tomorrow in detail how we can improve this test not to
rely on timing but probably some metadata.
> flaky testWithMismatchingPending -
> org.apache.cassandra.distributed.test.PreviewRepairTest
> ------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15685
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15685
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest
> Reporter: Kevin Gallardo
> Assignee: Ekaterina Dimitrova
> Priority: Normal
> Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: log-CASSANDRA-15685.txt, output
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Observed in:
> https://app.circleci.com/pipelines/github/newkek/cassandra/34/workflows/1c6b157d-13c3-48a9-85fb-9fe8c153256b/jobs/191/tests
> Failure:
> {noformat}
> testWithMismatchingPending -
> org.apache.cassandra.distributed.test.PreviewRepairTest
> junit.framework.AssertionFailedError
> at
> org.apache.cassandra.distributed.test.PreviewRepairTest.testWithMismatchingPending(PreviewRepairTest.java:97)
> {noformat}
> [Circle
> CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FCASSANDRA-15685]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]