On Sun, Sep 13, 2009 at 7:14 PM, Sanjit Jhala <[email protected]> wrote:
> Seems nice to have to have the ability to control the consistency behavior
> on a per client/app in stead of it being system wide.

Yeah, the behavior is controllable per mutator, which is already finer
granularity than per client/app.

> I think its a good idea to have a design for eventual consistency in mind
> for now and implement as required post 1.0.

I was just sick of people picking on the write availability issue,
which was brought up in about every conversation about Hypertable :)
Eventual consistency is easier to build on top of real consistency,
not vice versa.

__Luke

> -Sanjit
>
> On Sun, Sep 13, 2009 at 3:54 PM, Luke <[email protected]> wrote:
>>
>> On Sun, Sep 13, 2009 at 2:13 PM, Doug Judd <[email protected]> wrote:
>> > This looks like a nice way to add eventual consistency to Hypertable.  I
>> > like the fact that once it makes it into the proxy log it guarantees
>> > that
>> > the write will eventually make it into the system.  The only issue I see
>> > is
>> > that updates for a cell could get written out-of-order.  The client
>> > could
>> > end up writing a newer version of a cell before the proxy writer gets a
>> > chance to write the older version.  The application can just write self
>> > ordering entries using a monotonically increasing sequence number to
>> > solve
>> > this problem.
>>
>> Yeah, client or the proxy (when writing to the proxy log) can fill out
>> the revision/timestamp field of the cells.
>>
>> > I do question the need for eventual consistency.  I feel that this
>> > "concern"
>> > is theoretical.  The problem is that people do not have a well
>> > implemented
>> > Bigtable implementation to try out.  I suspect that this perceived
>> > problem
>> > is much less of an issue than people think.  Amazon developed this
>> > concept
>> > for their shopping cart.  If once every 1000th shopping cart update the
>> > system spun for 30 seconds with a message "System busy", would you
>> > really
>> > care?  If 999 times out of 1000, the shopping cart updated instantly,
>> > you
>> > would perceive the system as highly available.
>>
>> I'm with you on this one (shopping cart), I personally would suspect
>> my net connection issues first :) OTOH, if I'm an
>> front-end/application programmer who wants to log stuff directly into
>> Hypertable and don't really care about consistency (must log the
>> transactions but wouldn't read until batch processing later), having
>> to make sure the call doesn't timeout and lose the transaction in the
>> log is very annoying. I'd choose a back-end that makes my life easier.
>>
>> > I think we should wait on this until it is determined to be a real
>> > problem,
>> > not a theoretical one.  It might also be a worthy exercise to do a back
>> > of
>> > the envelope calculation based on failure rate data to determine the
>> > real
>> > impact of failures on availability.
>>
>> I think the choice really belongs to the users. I'd suggest that we
>> add "multiple path write proxy" (MPWP) feature (easy to implement and
>> TBD of course) to the slides to assuage people's irrational (or not)
>> fear about write latency under recovery :)
>>
>> __Luke
>>
>> > - Doug
>> >
>> > On Sat, Sep 12, 2009 at 1:37 PM, Luke <[email protected]> wrote:
>> >>
>> >> One of the biggest "concerns" from potential "real-time" users of
>> >> Hypertable is write latency spike when some nodes are down and being
>> >> recovered. Read latency/availability are usually masked by the caching
>> >> layer.
>> >>
>> >> Cassandra tries solve the problem by using "hinted handoff" (write
>> >> data tagged with a destination to an alternative node when the
>> >> destination node is down). Of course this mandates relaxing
>> >> consistency guarantee to "eventual", which is a trade-off many are
>> >> willing to make.
>> >>
>> >> I just thought that it's not that hard to implement something similar
>> >> in Hypertable and give user a choice between immediate and eventual
>> >> consistency:
>> >>
>> >> When a mutator is created with BEST_EFFORT/EVENTUAL_OK flag, instead
>> >> of keep retrying writes in the client when a destination node is down,
>> >> it tries to write to an alternative range server with a special update
>> >> flag, which persists the writes to a proxy log. The maintenance
>> >> threads on the alternative range server will try to to empty proxy log
>> >> by retry the writes. Alternative range servers can be picked using a
>> >> random (sort the server list by their md5 of their ip address and the
>> >> alternatives are the next n servers) or a location (data center/rack)
>> >> aware scheme. Note this approach works even when the alternative node
>> >> dies when proxy logs are not yet cleared.
>> >>
>> >> Thoughts?
>> >>
>> >> __Luke
>> >>
>> >
>> >
>> > >
>> >
>>
>>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to