[
https://issues.apache.org/jira/browse/CASSANDRA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780152#comment-13780152
]
Christopher Smith commented on CASSANDRA-6106:
----------------------------------------------
I agree there are two approaches to addressing the problem. I had assumed that
given the design, it was intended that this problem exist (why even code it up
otherwise), but I'm glad if it was not.
Regarding clock drift: I'm happy to code up a version that recalibrates to
currentTimeMillis() every N seconds. I thought about doing it every call, but
that would mean calling currrentTimeMillis() *and* nanoTime() on every write,
which seems... expensive. Can someone suggest an appropriate interval?
As for assigning an ID per node and storing it in the cell, I don't think we
should do that. First, we don't really have CAS operations. We have CAS
operations *if you can coordinate with a majority of nodes*. That means, in the
case of a split, you would not be able to set ID's for nodes in one portion of
the split, and possibly all portions. Worse, you might have situations where
for certain portions of the ring one part of the split has the majority while
for other portions of the ring with the other part of the split has it. In
general, if a new node comes on, I don't think we want to require that it have
access to a "quorum" of nodes in order to accept writes.
I can understand the design to avoid storing extra bytes per cell being
appealing. We could store a proper type 1 UUID, but even that still has at
least theoretical collisions, and of course the space would be prohibitive.
Storing the ID in extra bits in the timestamp is NOT a good idea. Sure you have
a few centuries of headroom, but the CQL protocol exposes this field's value.
Not only can clients read back the value and interpret it as microseconds, but
they can *write* a value to it (and probably should be right now if they want
to avoid problems related to this bug). If they are writing client generated
values (which, if you think about it, needn't be tied to the time at all), they
are in for a nasty surprise: two bytes of their values have been masked off.
I think it is much simpler to exploit the fact that when two nodes are talking,
they already have plenty of context for determining who should win. Then you
don't have to store anything extra with each cell. If you want to assign a
distinct ID for each node, do so but it would seem the existing Cassandra
design already needs to have ways of uniquely identifying each active node, so
why bother creating a new system?
> QueryState.getTimestamp() & FBUtilities.timestampMicros() reads current
> timestamp with System.currentTimeMillis() * 1000 instead of System.nanoTime()
> / 1000
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-6106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6106
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: DSE Cassandra 3.1, but also HEAD
> Reporter: Christopher Smith
> Priority: Minor
> Labels: collision, conflict, timestamp
> Attachments: microtimstamp.patch
>
>
> I noticed this blog post: http://aphyr.com/posts/294-call-me-maybe-cassandra
> mentioned issues with millisecond rounding in timestamps and was able to
> reproduce the issue. If I specify a timestamp in a mutating query, I get
> microsecond precision, but if I don't, I get timestamps rounded to the
> nearest millisecond, at least for my first query on a given connection, which
> substantially increases the possibilities of collision.
> I believe I found the offending code, though I am by no means sure this is
> comprehensive. I think we probably need a fairly comprehensive replacement of
> all uses of System.currentTimeMillis() with System.nanoTime().
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira