[
https://issues.apache.org/jira/browse/CASSANDRA-16680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352536#comment-17352536
]
Andres de la Peña commented on CASSANDRA-16680:
-----------------------------------------------
I think there are two separate causes for the failures.
One is that both
[{{testDropExpiredSSTables}}|https://github.com/apache/cassandra/blob/ce877cbe2b7c11355b07cac6f1996a9c9006d89f/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java#L258]
and
[{{testDropOverlappingExpiredSSTables}}|https://github.com/apache/cassandra/blob/ce877cbe2b7c11355b07cac6f1996a9c9006d89f/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java#L306]
create an expiring table with a TTL of one second and then they verify the
next compaction task, assuming that the TTL hasn't expired yet
([here|https://github.com/apache/cassandra/blob/ce877cbe2b7c11355b07cac6f1996a9c9006d89f/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java#L306]
and
[here|https://github.com/apache/cassandra/blob/ce877cbe2b7c11355b07cac6f1996a9c9006d89f/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java#L336-L338]).
In a slow CI run that check can happen after the TTL of one second has
expired, so the assert will fail. The proposed PR [simply uses an TTL of 10
seconds|https://github.com/apache/cassandra/pull/1026/commits/1504474dc3453905558766ba44c05802eeb06635],
which seems long enough to survive 10K multiplexer runs. It would be ideal to
change the test to not be based on sleeps, but I'm afraid that would require
some refactoring out of the test, and I'm not sure we want to do that at this
point.
The second problem happens when
[{{testDropOverlappingExpiredSSTables}}|https://github.com/apache/cassandra/blob/ce877cbe2b7c11355b07cac6f1996a9c9006d89f/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java#L314]
creates a sstable with a TTLed row, and a second sstable with another version
of the same row without TTL and with and older timestamp. The intention is that
the TTLed row should supersede the non-TTLed row. The problem is that the
timestamps assigned to each row are based on separated calls to
{{System.currentTimeMillis()}}, in such a way that a slow run can produce the
opposite ordering of timestamps, so the non-TTLed row supersedes the TTLed one.
The proposed solution is making the values of both timestamps based on the same
call to {{System.currentTimeMillis()}}, as it's done
[here|https://github.com/apache/cassandra/pull/1026/commits/ad602fa02135be6a1a43e6fe4dd87c9915885f66].
The PR also includes some minor cosmetic changes and fixes for typos and IDE
warnings, in [this
commit|https://github.com/apache/cassandra/pull/1026/commits/9d80053e5053f23a3eade9f294dd1fd2436d96c0].
The test has passed 10K multiplexer runs with
[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/515/workflows/e22d1a2f-bdc9-4bbb-a8d0-f2302bf406fd]
and
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/515/workflows/2fa6965f-b5f1-4897-b3c0-186a6aa9f530].
> TimeWindowCompactionStrategyTest flaky
> --------------------------------------
>
> Key: CASSANDRA-16680
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16680
> Project: Cassandra
> Issue Type: Bug
> Components: CI
> Reporter: Ekaterina Dimitrova
> Assignee: Andres de la Peña
> Priority: Normal
> Fix For: 4.0-rc
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Seen in Jenkins:
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/785/
> Failed two times with the multiplexer
> [https://app.circleci.com/pipelines/github/adelapena/cassandra/461/workflows/7a837b82-c0d1-4e10-8932-c5908d2585de/jobs/4114]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]