Hi Kant, If you read the published papers about Paxos, you will most probably recognize that there is no way to "do it better". This is a conceptional thing due to the nature of distributed systems + the CAP theorem. If you want A+P in the triangle, then C is very expensive. CS is made for A+P mostly with tunable C. In ACID databases this is a completely different thing as they are mostly either not partition tolerant, not highly available or not scalable (in a distributed manner, not speaking of "monolithic super servers").
There is no free lunch ... 2017-02-10 11:09 GMT+01:00 Kant Kodali <k...@peernova.com>: > "That’s the safety blanket everyone wants but is extremely expensive, > especially in Cassandra." > > yes LWT's are expensive. Are there any plans to make this better? > > On Fri, Feb 10, 2017 at 12:17 AM, Kant Kodali <k...@peernova.com> wrote: > >> Hi Jon, >> >> Thanks a lot for your response. I am well aware that the LWW != LWT but I >> was talking more in terms of LWW with respective to LWT's which I believe >> you answered. so thanks much! >> >> kant >> >> On Thu, Feb 9, 2017 at 6:01 PM, Jon Haddad <jonathan.had...@gmail.com> >> wrote: >> >>> LWT != Last Write Wins. They are totally different. >>> >>> LWTs give you (assuming you also read at SERIAL) “atomic consistency”, >>> meaning you are able to perform operations atomically and in isolation. >>> That’s the safety blanket everyone wants but is extremely expensive, >>> especially in Cassandra. The lightweight part, btw, may be a little >>> optimistic, especially if a key is under contention. With regard to the >>> “last write” part you’re asking about - w/ LWT Cassandra provides the >>> timestamp and manages it as part of the ballot, and it always is >>> increasing. See org.apache.cassandra.servi >>> ce.ClientState#getTimestampForPaxos. From the code: >>> >>> * Returns a timestamp suitable for paxos given the timestamp of the >>> last known commit (or in progress update). >>> * Paxos ensures that the timestamp it uses for commits respects the >>> serial order of those commits. It does so >>> * by having each replica reject any proposal whose timestamp is not >>> strictly greater than the last proposal it >>> * accepted. So in practice, which timestamp we use for a given proposal >>> doesn't affect correctness but it does >>> * affect the chance of making progress (if we pick a timestamp lower >>> than what has been proposed before, our >>> * new proposal will just get rejected). >>> >>> Effectively paxos removes the ability to use custom timestamps and >>> addresses clock variance by rejecting ballots with timestamps less than >>> what was last seen. You can learn more by reading through the other >>> comments and code in that file. >>> >>> Last write wins is a free for all that guarantees you *nothing* except >>> the timestamp is used as a tiebreaker. Here we acknowledge things like the >>> speed of light as being a real problem that isn’t going away anytime soon. >>> This problem is sometimes addressed with event sourcing rather than >>> mutating in place. >>> >>> Hope this helps. >>> >>> Jon >>> >>> >>> On Feb 9, 2017, at 5:21 PM, Kant Kodali <k...@peernova.com> wrote: >>> >>> @Justin I read this article http://www.datastax.com/dev/bl >>> og/lightweight-transactions-in-cassandra-2-0. And it clearly says >>> Linearizable consistency can be achieved with LWT's. so should I assume >>> the Linearizability in the context of the above article is possible >>> with LWT's and synchronization of clocks through ntpd ? because LWT's also >>> follow Last Write Wins. isn't it? Also another question does most of the >>> production clusters do setup ntpd? If so what is the time it takes to sync? >>> any idea >>> >>> @Micheal Schuler Are you referring to something like true time as in >>> https://static.googleusercontent.com/media/research.google.c >>> om/en//archive/spanner-osdi2012.pdf? Actually I never heard of setting >>> up GPS modules and how that can be helpful. Let me research on that but >>> good point. >>> >>> On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler <mich...@pbandjelly.org> >>> wrote: >>> >>>> If you require the best precision you can get, setting up a pair of >>>> stratum 1 ntpd masters in each data center location with a GPS modules >>>> is not terribly complex. Low latency and jitter on servers you manage. >>>> 140ms is a long way away network-wise, and I would suggest that was a >>>> poor choice of upstream (probably stratum 2 or 3) source. >>>> >>>> As Jonathan mentioned, there's no guarantee from Cassandra, but if you >>>> need as close as you can get, you'll probably need to do it yourself. >>>> >>>> (I run several stratum 2 ntpd servers for pool.ntp.org) >>>> >>>> -- >>>> Kind regards, >>>> Michael >>>> >>>> On 02/09/2017 06:47 PM, Kant Kodali wrote: >>>> > Hi Justin, >>>> > >>>> > There are bunch of issues w.r.t to synchronization of clocks when we >>>> > used ntpd. Also the time it took to sync the clocks was approx 140ms >>>> > (don't quote me on it though because it is reported by our devops :) >>>> > >>>> > we have multiple clients (for example bunch of micro services are >>>> > reading from Cassandra) I am not sure how one can achieve >>>> > Linearizability by setting timestamps on the clients ? since there is >>>> no >>>> > total ordering across multiple clients. >>>> > >>>> > Thanks! >>>> > >>>> > >>>> > On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron < >>>> jus...@instaclustr.com >>>> > <mailto:jus...@instaclustr.com>> wrote: >>>> > >>>> > Hi Kant, >>>> > >>>> > Clock synchronization is important - you should ensure that ntpd >>>> is >>>> > properly configured on all nodes. If your particular use case is >>>> > especially sensitive to out-of-order mutations it is possible to >>>> set >>>> > timestamps on the client side using the >>>> > drivers. https://docs.datastax.com/en/d >>>> eveloper/java-driver/3.1/manual/query_timestamps/ >>>> > <https://docs.datastax.com/en/developer/java-driver/3.1/man >>>> ual/query_timestamps/> >>>> > >>>> > We use our own NTP cluster to reduce clock drift as much as >>>> > possible, but public NTP servers are good enough for most >>>> > uses. https://www.instaclustr.com/bl >>>> og/2015/11/05/apache-cassandra-synchronization/ >>>> > <https://www.instaclustr.com/blog/2015/11/05/apache-cassand >>>> ra-synchronization/> >>>> > >>>> > Cheers, >>>> > Justin >>>> > >>>> > On Thu, 9 Feb 2017 at 16:09 Kant Kodali <k...@peernova.com >>>> > <mailto:k...@peernova.com>> wrote: >>>> > >>>> > How does Cassandra achieve Linearizability with “Last write >>>> > wins” (conflict resolution methods based on time-of-day >>>> clocks) ? >>>> > >>>> > Relying on synchronized clocks are almost certainly >>>> > non-linearizable, because clock timestamps cannot be >>>> guaranteed >>>> > to be consistent with actual event ordering due to clock skew. >>>> > isn't it? >>>> > >>>> > Thanks! >>>> > >>>> > -- >>>> > >>>> > Justin Cameron >>>> > >>>> > Senior Software Engineer | Instaclustr >>>> > >>>> > >>>> > >>>> > >>>> > This email has been sent on behalf of Instaclustr Pty Ltd >>>> > (Australia) and Instaclustr Inc (USA). >>>> > >>>> > This email and any attachments may contain confidential and >>>> legally >>>> > privileged information. If you are not the intended recipient, do >>>> > not copy or disclose its content, but please reply to this email >>>> > immediately and highlight the error to the sender and then >>>> > immediately delete the message. >>>> > >>>> > >>>> >>>> >>> >>> >> > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer