[jira] [Commented] (CASSANDRA-6106) QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current timestamp with System.currentTimeMillis() * 1000 instead of System.nanoTime() / 1000

Christopher Smith (JIRA) Fri, 27 Sep 2013 10:53:55 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780152#comment-13780152
 ]


Christopher Smith commented on CASSANDRA-6106:
----------------------------------------------

I agree there are two approaches to addressing the problem. I had assumed that 
given the design, it was intended that this problem exist (why even code it up 
otherwise), but I'm glad if it was not.

Regarding clock drift: I'm happy to code up a version that recalibrates to 
currentTimeMillis() every N seconds. I thought about doing it every call, but 
that would mean calling currrentTimeMillis() *and* nanoTime() on every write, 
which seems... expensive. Can someone suggest an appropriate interval?

As for assigning an ID per node and storing it in the cell, I don't think we 
should do that. First, we don't really have CAS operations. We have CAS 
operations *if you can coordinate with a majority of nodes*. That means, in the 
case of a split, you would not be able to set ID's for nodes in one portion of 
the split, and possibly all portions. Worse, you might have situations where 
for certain portions of the ring one part of the split has the majority while 
for other portions of the ring with the other part of the split has it. In 
general, if a new node comes on, I don't think we want to require that it have 
access to a "quorum" of nodes in order to accept writes.

I can understand the design to avoid storing extra bytes per cell being 
appealing. We could store a proper type 1 UUID, but even that still has at 
least theoretical collisions, and of course the space would be prohibitive.

Storing the ID in extra bits in the timestamp is NOT a good idea. Sure you have 
a few centuries of headroom, but the CQL protocol exposes this field's value. 
Not only can clients read back the value and interpret it as microseconds, but 
they can *write* a value to it (and probably should be right now if they want 
to avoid problems related to this bug). If they are writing client generated 
values (which, if you think about it, needn't be tied to the time at all), they 
are in for a nasty surprise: two bytes of their values have been masked off.

I think it is much simpler to exploit the fact that when two nodes are talking, 
they already have plenty of context for determining who should win. Then you 
don't have to store anything extra with each cell. If you want to assign a 
distinct ID for each node, do so but it would seem the existing Cassandra 
design already needs to have ways of uniquely identifying each active node, so 
why bother creating a new system?
                
> QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current 
> timestamp with System.currentTimeMillis() * 1000 instead of System.nanoTime() 
> / 1000
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6106
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6106
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: DSE Cassandra 3.1, but also HEAD
>            Reporter: Christopher Smith
>            Priority: Minor
>              Labels: collision, conflict, timestamp
>         Attachments: microtimstamp.patch
>
>
> I noticed this blog post: http://aphyr.com/posts/294-call-me-maybe-cassandra 
> mentioned issues with millisecond rounding in timestamps and was able to 
> reproduce the issue. If I specify a timestamp in a mutating query, I get 
> microsecond precision, but if I don't, I get timestamps rounded to the 
> nearest millisecond, at least for my first query on a given connection, which 
> substantially increases the possibilities of collision.
> I believe I found the offending code, though I am by no means sure this is 
> comprehensive. I think we probably need a fairly comprehensive replacement of 
> all uses of System.currentTimeMillis() with System.nanoTime().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-6106) QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current timestamp with System.currentTimeMillis() * 1000 instead of System.nanoTime() / 1000

Reply via email to