[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17214926#comment-17214926
 ] 

Adam Holmberg commented on CASSANDRA-15996:
-------------------------------------------

bq. instead of relying on the patient cql connection, lets add flags to wait 
for the binary protocol and other startup stuff to complete
The test already waits for binary protocol log-wise, and the connection is 
established after that. I'm not sure what else we would add. 

bq. NoSpamLogger has some shuffling of instances around that maybe have a 
concurrency hole, maybe I am just imagining things. 
I've stared at this quite a bit and I am reasonably confident there is not an 
issue with those mappings. Reasoning in part is as we have mentioned there is 
only a single request in-flight. The other is that no matter what kind of race 
we could come up with, worst case scenario is we create new wrappers -- there 
are no runtime errors and it's still using the same logger internally (if it 
was even the same key). Incidentally I have also never seen another 
{{NoSpamLogger}} message across thousands of runs of this test.

With that in mind I stared a bit more at the [other 
thing|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/utils/NoSpamLogger.java#L78-L82]
 that could cause this not to be logged. {{minIntervalNanos}} is coming from a 
[static 
field|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/db/ExpirationDateOverflowHandling.java#L39]
 and guaranteed to be set to a known value. {{expected}} is the default 
zero-initialized value of an AtomicInteger. {{nowNanos}}, on the other hand, is 
coming from 
[{{System.nanoTime}}|https://github.com/apache/cassandra/blob/699a1f74fcc1da1952da6b2b0309c9e2474c67f4/src/java/org/apache/cassandra/utils/NoSpamLogger.java#L59-L62],
 which (TIL) can be 
[negative|https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime--]:

bq. This method can only be used to measure elapsed time and is not related to 
any other notion of system or wall-clock time. The value returned represents 
nanoseconds since some fixed but arbitrary origin time (perhaps in the future, 
so values may be negative). 

I haven't found a way to prove it, but presently this is my only plausible 
theory. I think we should switch NoSpamLogger to use {{currentTimeMillis}}. We 
know its non monotonic and may be less precise, but I think it fits the bill 
for the spirit of this class, where callers are specifying intervals on the 
order of whole seconds and minutes.

Please let me know if anyone has thoughts on that.

> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15996
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: David Capwell
>            Assignee: Adam Holmberg
>            Priority: Normal
>             Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >           assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E           AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E           assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to