[ 
https://issues.apache.org/jira/browse/CASSANDRA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956938#comment-13956938
 ] 

Benedict edited comment on CASSANDRA-6106 at 4/1/14 7:34 PM:
-------------------------------------------------------------

It doesn't look safe to me to simply grab gtod.wall_time_sec anyway, even if we 
could find its location, as the nanos value gets repaired after reading with 
another call. We could investigate further, but for the time being I have a 
reasonably straightforward solution 
[here|http://github.com/belliottsmith/cassandra/tree/6106-microstime]

I started by simply calling the rt clock_gettime method through JNA, which 
unfortunately clocks in at a heavy 7 micros; since nanoTime and 
currentTimeMillis are < 0.03 micros, this seemed a little unacceptable. So what 
I've done is opted to periodically (once per second) grab the latest micros 
time via the best method possible (clock_gettime if available, 
currentTimeMillis * 1000 otherwise) and use this to reset the offset, however 
to ensure we have a smooth transition I:

# Cap the rate of  change at 50ms per second
# Ensure it never leaps back in time, at least on any given thread (no way to 
guarantee stronger than this)
# Only apply a change if it is at least 1ms out, to avoid noise (possibly 
should tighten this to 100 micros, or dependent on resolution of time library 
we're using)

The result is a method that costs around the same as a raw call to 
System.nanoTime() but gives pretty decent accuracy. Obviously any method that 
involves using nanos and calculating an offset from a method that takes 
~7micros to return is going to have an inherent inaccuracy, but no more than 
the 7micros direct method call would itself, and the inaccuracy will be 
consistent given the jitter reduction I'm applying. At startup we also sample 
the offset 10k times, derive a 90%ile for elapsed time fetching the offset (we 
ignore future offsets we calculate that take more than twice this period to 
sample) and average all of those within the 90%ile.




was (Author: benedict):
It doesn't look safe to me to simply grab gtod.wall_time_sec anyway, even if we 
could find its location, as the nanos value gets repaired after reading with 
another call. We could investigate further, but for the time being I have a 
reasonably straightforward solution 
[here|github.com/belliottsmith/cassandra/tree/6106-microstime]

I started by simply calling the rt clock_gettime method through JNA, which 
unfortunately clocks in at a heavy 7 micros; since nanoTime and 
currentTimeMillis are < 0.03 micros, this seemed a little unacceptable. So what 
I've done is opted to periodically (once per second) grab the latest micros 
time via the best method possible (clock_gettime if available, 
currentTimeMillis * 1000 otherwise) and use this to reset the offset, however 
to ensure we have a smooth transition I:

# Cap the rate of  change at 50ms per second
# Ensure it never leaps back in time, at least on any given thread (no way to 
guarantee stronger than this)
# Only apply a change if it is at least 1ms out, to avoid noise (possibly 
should tighten this to 100 micros, or dependent on resolution of time library 
we're using)

The result is a method that costs around the same as a raw call to 
System.nanoTime() but gives pretty decent accuracy. Obviously any method that 
involves using nanos and calculating an offset from a method that takes 
~7micros to return is going to have an inherent inaccuracy, but no more than 
the 7micros direct method call would itself, and the inaccuracy will be 
consistent given the jitter reduction I'm applying. At startup we also sample 
the offset 10k times, derive a 90%ile for elapsed time fetching the offset (we 
ignore future offsets we calculate that take more than twice this period to 
sample) and average all of those within the 90%ile.



> QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current 
> timestamp with System.currentTimeMillis() * 1000 instead of System.nanoTime() 
> / 1000
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6106
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6106
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: DSE Cassandra 3.1, but also HEAD
>            Reporter: Christopher Smith
>            Assignee: Benedict
>            Priority: Minor
>              Labels: timestamps
>             Fix For: 2.1 beta2
>
>         Attachments: microtimstamp.patch, microtimstamp_random.patch, 
> microtimstamp_random_rev2.patch
>
>
> I noticed this blog post: http://aphyr.com/posts/294-call-me-maybe-cassandra 
> mentioned issues with millisecond rounding in timestamps and was able to 
> reproduce the issue. If I specify a timestamp in a mutating query, I get 
> microsecond precision, but if I don't, I get timestamps rounded to the 
> nearest millisecond, at least for my first query on a given connection, which 
> substantially increases the possibilities of collision.
> I believe I found the offending code, though I am by no means sure this is 
> comprehensive. I think we probably need a fairly comprehensive replacement of 
> all uses of System.currentTimeMillis() with System.nanoTime().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to