On Sun, Sep 13, 2009 at 7:14 PM, Sanjit Jhala <[email protected]> wrote: > Seems nice to have to have the ability to control the consistency behavior > on a per client/app in stead of it being system wide.
Yeah, the behavior is controllable per mutator, which is already finer granularity than per client/app. > I think its a good idea to have a design for eventual consistency in mind > for now and implement as required post 1.0. I was just sick of people picking on the write availability issue, which was brought up in about every conversation about Hypertable :) Eventual consistency is easier to build on top of real consistency, not vice versa. __Luke > -Sanjit > > On Sun, Sep 13, 2009 at 3:54 PM, Luke <[email protected]> wrote: >> >> On Sun, Sep 13, 2009 at 2:13 PM, Doug Judd <[email protected]> wrote: >> > This looks like a nice way to add eventual consistency to Hypertable. I >> > like the fact that once it makes it into the proxy log it guarantees >> > that >> > the write will eventually make it into the system. The only issue I see >> > is >> > that updates for a cell could get written out-of-order. The client >> > could >> > end up writing a newer version of a cell before the proxy writer gets a >> > chance to write the older version. The application can just write self >> > ordering entries using a monotonically increasing sequence number to >> > solve >> > this problem. >> >> Yeah, client or the proxy (when writing to the proxy log) can fill out >> the revision/timestamp field of the cells. >> >> > I do question the need for eventual consistency. I feel that this >> > "concern" >> > is theoretical. The problem is that people do not have a well >> > implemented >> > Bigtable implementation to try out. I suspect that this perceived >> > problem >> > is much less of an issue than people think. Amazon developed this >> > concept >> > for their shopping cart. If once every 1000th shopping cart update the >> > system spun for 30 seconds with a message "System busy", would you >> > really >> > care? If 999 times out of 1000, the shopping cart updated instantly, >> > you >> > would perceive the system as highly available. >> >> I'm with you on this one (shopping cart), I personally would suspect >> my net connection issues first :) OTOH, if I'm an >> front-end/application programmer who wants to log stuff directly into >> Hypertable and don't really care about consistency (must log the >> transactions but wouldn't read until batch processing later), having >> to make sure the call doesn't timeout and lose the transaction in the >> log is very annoying. I'd choose a back-end that makes my life easier. >> >> > I think we should wait on this until it is determined to be a real >> > problem, >> > not a theoretical one. It might also be a worthy exercise to do a back >> > of >> > the envelope calculation based on failure rate data to determine the >> > real >> > impact of failures on availability. >> >> I think the choice really belongs to the users. I'd suggest that we >> add "multiple path write proxy" (MPWP) feature (easy to implement and >> TBD of course) to the slides to assuage people's irrational (or not) >> fear about write latency under recovery :) >> >> __Luke >> >> > - Doug >> > >> > On Sat, Sep 12, 2009 at 1:37 PM, Luke <[email protected]> wrote: >> >> >> >> One of the biggest "concerns" from potential "real-time" users of >> >> Hypertable is write latency spike when some nodes are down and being >> >> recovered. Read latency/availability are usually masked by the caching >> >> layer. >> >> >> >> Cassandra tries solve the problem by using "hinted handoff" (write >> >> data tagged with a destination to an alternative node when the >> >> destination node is down). Of course this mandates relaxing >> >> consistency guarantee to "eventual", which is a trade-off many are >> >> willing to make. >> >> >> >> I just thought that it's not that hard to implement something similar >> >> in Hypertable and give user a choice between immediate and eventual >> >> consistency: >> >> >> >> When a mutator is created with BEST_EFFORT/EVENTUAL_OK flag, instead >> >> of keep retrying writes in the client when a destination node is down, >> >> it tries to write to an alternative range server with a special update >> >> flag, which persists the writes to a proxy log. The maintenance >> >> threads on the alternative range server will try to to empty proxy log >> >> by retry the writes. Alternative range servers can be picked using a >> >> random (sort the server list by their md5 of their ip address and the >> >> alternatives are the next n servers) or a location (data center/rack) >> >> aware scheme. Note this approach works even when the alternative node >> >> dies when proxy logs are not yet cleared. >> >> >> >> Thoughts? >> >> >> >> __Luke >> >> >> > >> > >> > > >> > >> >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
