to clarify - the currentRecordTs would be saved on a field on the record being
persisted
From: Jen Smith <jendaboar...@yahoo.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Sent: Thursday, May 12, 2016 10:32 AM
Subject: client time stamp - force to be continuously increasing?
I'd like to get feedback/opinions on a possible work around for a timestamp +
data consistency edge case issue.
Context for this question:
When using client timestamp (default timestamp), on C* that supports it (v3
protocol), on occasion a record update is lost when executing updates in rapid
succession (less than a second between updates). This is because C* by design
(Last Write Wins) discards record updates with 'older' timestamp (from client),
and server clocks (whether using client timestamp or c* node system timestamp)
can move backwards, which results in data loss (eventual consistency is not
reached).
For anyone needing more background, this blog has much of the detail
https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized
as:"Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source,
which is backed by gettimeofday. Pretty much every Cassandra client out there
does something similar. That means that the timestamps for writes made in a
session are derived either from a single Cassandra server clock, or a single
app server clock. These clocks can flow backwards, for a number of reasons:-
Hardware wonkiness can push clocks days or centuries into the future or past.-
Virtualization can wreak havoc on kernel timekeeping.- Misconfigured nodes may
not have NTP enabled, or may not be able to reach upstream sources.- Upstream
NTP servers can lie.- When the problem is identified and fixed, NTP corrects
large time differentials by jumping the clock discontinously to the correct
time.- Even when perfectly synchronized, POSIX time itself is not monotonic....
If the system clock goes backwards for any reason, Cassandra’s session
consistency guarantees no longer hold."
This blog goes on to suggest a monotonic clock (zookeeper as a possibility, but
slow), or better NTP synching (which leaves gaps).
My question is if this can be addressed via software by (using? abusing?) the
client provided timestamp field and forcing it to be continuously increasing,
and what unexpected issues may arise from doing so?
Specifically, my idea is to set a timestamp on the record when it is created
(from the system time of the client doing the create). then on subsequent
updates, always setting default client timestamp to the result of:
currentRecordTs = Math.max(currentRecordTs + standardDelta,
System.currentTimeMillis());
(where standardDelta is probably 1 second)
Essentially this is keeping a wall clock guard on the record itself, to prevent
backwards time stamping/ lost data and ensuring c* applies these updates in the
proper order and does not discard any for being 'out of sequence' (ie,
persisted 'after' a newer timestamped record was already persisted).
One (acceptable) drawback is that this will result in slightly inaccurate
'timestamp' being set, when currentRecordTs + standardDelta >
System.currentTimeMillis() , and that this could skew more incorrectly over
time.
Would you please advise me of any other problems, downstream effects, pitfalls
or data consistency issues this approach might cause? For example will C*
object if the 'quasi' timestamp gets 'too far' in the future?
More info - The system in question has LOCAL_QUORUM read/write consistency; and
one client (c* session) is usually only updating a record at a time. (although
concurrent updates from multiple clients are allowed- LWW is expected for that
scenario, and some ambiguity here is ok).
I apologize if this is a duplicate post to the list from me - I first sent this
question when i was not subscribed to the list yet, so I am not sure if it has
duplicated or not.
thank you kindly for the advice,J. Smith