Thank you - I think this driver solution may address a portion of the problem? Since this solution is from the driver, is it correct to assume that although this could potentially fix the issue within a single (client) session, it could not fix it for a pool of clients, where client A sent the first update and client B sent the 2nd one (because driver session doesn't share memory/data between clients)? Is this correct? if so, I think this doesn't provide the full HA client solution. but I think it does help to confirm that a 'reasonable' approach to solving the overall problem is software enforcement of a 'rigorously increasing timestamp' with the understood impact of drifting our timestamps out into the future (when conflict is identified). it also sounds like from that jira ticket, the continually updating increment can be miliseconds, not seconds (note we are not using batch statements, which I believe have/had a TS granularity bug in the past).
From: Alexandre Dutra <alexandre.du...@datastax.com> To: Jen Smith <jendaboar...@yahoo.com>; "user@cassandra.apache.org" <user@cassandra.apache.org> Sent: Thursday, May 12, 2016 11:28 AM Subject: Re: client time stamp - force to be continuously increasing? Hi, Among the ideas worth exploring, please note that the DataStax Java driver for Cassandra now includes a modified version of its monotonic timestamp generators that will indeed strive to provide rigorously increasing timestamps, even in the event of a system clock skew (in which case, they would keep drifting in the future). Such generators obviously do not pretend to provide the same monotonicity guarantees as a vector clock, but have at least the advantage of being fairly easy to set up. See JAVA-727[1] for details. Hope that helps, Alexandre [1] https://datastax-oss.atlassian.net/browse/JAVA-727 On Thu, May 12, 2016 at 7:35 PM Jen Smith <jendaboar...@yahoo.com> wrote: to clarify - the currentRecordTs would be saved on a field on the record being persisted From: Jen Smith <jendaboar...@yahoo.com> To: "user@cassandra.apache.org" <user@cassandra.apache.org> Sent: Thursday, May 12, 2016 10:32 AM Subject: client time stamp - force to be continuously increasing? I'd like to get feedback/opinions on a possible work around for a timestamp + data consistency edge case issue. Context for this question: When using client timestamp (default timestamp), on C* that supports it (v3 protocol), on occasion a record update is lost when executing updates in rapid succession (less than a second between updates). This is because C* by design (Last Write Wins) discards record updates with 'older' timestamp (from client), and server clocks (whether using client timestamp or c* node system timestamp) can move backwards, which results in data loss (eventual consistency is not reached). For anyone needing more background, this blog has much of the detail https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized as:"Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source, which is backed by gettimeofday. Pretty much every Cassandra client out there does something similar. That means that the timestamps for writes made in a session are derived either from a single Cassandra server clock, or a single app server clock. These clocks can flow backwards, for a number of reasons:- Hardware wonkiness can push clocks days or centuries into the future or past.- Virtualization can wreak havoc on kernel timekeeping.- Misconfigured nodes may not have NTP enabled, or may not be able to reach upstream sources.- Upstream NTP servers can lie.- When the problem is identified and fixed, NTP corrects large time differentials by jumping the clock discontinously to the correct time.- Even when perfectly synchronized, POSIX time itself is not monotonic.... If the system clock goes backwards for any reason, Cassandra’s session consistency guarantees no longer hold." This blog goes on to suggest a monotonic clock (zookeeper as a possibility, but slow), or better NTP synching (which leaves gaps). My question is if this can be addressed via software by (using? abusing?) the client provided timestamp field and forcing it to be continuously increasing, and what unexpected issues may arise from doing so? Specifically, my idea is to set a timestamp on the record when it is created (from the system time of the client doing the create). then on subsequent updates, always setting default client timestamp to the result of: currentRecordTs = Math.max(currentRecordTs + standardDelta, System.currentTimeMillis()); (where standardDelta is probably 1 second) Essentially this is keeping a wall clock guard on the record itself, to prevent backwards time stamping/ lost data and ensuring c* applies these updates in the proper order and does not discard any for being 'out of sequence' (ie, persisted 'after' a newer timestamped record was already persisted). One (acceptable) drawback is that this will result in slightly inaccurate 'timestamp' being set, when currentRecordTs + standardDelta > System.currentTimeMillis() , and that this could skew more incorrectly over time. Would you please advise me of any other problems, downstream effects, pitfalls or data consistency issues this approach might cause? For example will C* object if the 'quasi' timestamp gets 'too far' in the future? More info - The system in question has LOCAL_QUORUM read/write consistency; and one client (c* session) is usually only updating a record at a time. (although concurrent updates from multiple clients are allowed- LWW is expected for that scenario, and some ambiguity here is ok). I apologize if this is a duplicate post to the list from me - I first sent this question when i was not subscribed to the list yet, so I am not sure if it has duplicated or not. thank you kindly for the advice,J. Smith -- Alexandre Dutra Driver & Tools Engineer @ DataStax