Thanks for the update, Stephen (I am now realizing that there are 2 Stephens
in this thread and that you were not having a dialogue in public with
yourself)! I misspoke earlier: the numbers I cited were latency numbers in
the event of a failure in the Slave datastore. The actual numbers for the
majority (99+%) of requests is still a few hundred milliseconds. In most
cases, replication is even synchronous. It's in the error cases where the
average replication lag is 3 minutes.

- The number of errors is less than a fraction of a fraction of a fraction
of a percent. I don't think a percent number makes sense here: it's the fact
that the errors are not evenly distributed. That is - a request that blocks
due to unavailability is likely to follow be several requests and lead to a
sudden spike of instability in your application that may depend on many
small, fast calls.

To answer your last two questions, it makes more sense to explain how
eventually consistent reads work:

1. Fire off an RPC to the datastore service internally.
2. Wait a grace period for this RPC to return. This grace period is in the
tens of milliseconds. Most requests (99+%) will return in a time that is
far, far under this grace period
3. If the grace period has been surpassed, we send out an RPC to the slave
datastore. Now - we just wait to see who returned first: the original RPC to
the master datastore or the second RPC to the slave datastore. Again, in
most cases, the master datastore is likely to respond first simply because
it was given a head start

I really appreciate you asking for clarification. This is confusing, and
your questions have given us some good direction about how to improve the
documentation.

--
Ikai Lan
Developer Programs Engineer, Google App Engine
Blogger: http://googleappengine.blogspot.com
Reddit: http://www.reddit.com/r/appengine
Twitter: http://twitter.com/app_engine



On Fri, Jan 7, 2011 at 6:19 AM, Stephen <[email protected]> wrote:

>
>
> On Friday, January 7, 2011 12:04:36 AM UTC, Ikai Lan (Google) wrote:
>>
>> Stephen,
>>
>> The times I gave earlier were estimates of how much replication delay is
>> introduced in each replication scheme. The "eventually consistent" flag is
>> for reads only and dictates whether or not you care to read from the
>> "Master" datastore - reads will go to a slave if there are issues reaching
>> the master.
>>
>
> Ah, I misread your original answer as relating only to the high replication
> datastore - thanks.
>
> This is kind of a surprising answer: replication between data centres
> (hi-rep) takes 100ms; replication within a data centre (M/S) takes
> 3000-10000ms.  I suppose this is a trade-off: because the default for MS is
> strong consistency you can sacrifice replication lag for higher
> throughput...?
>
> After I asked the question I came across the original blog announcement for
> the EVENTUALLY_CONSISTENT option, and it already contains the answer:
>
>
> http://googleappengine.blogspot.com/2010/03/read-consistency-deadlines-more-control.html
>
>   "The secondary location may not have all of the changes made to the
> primary location at the time the data is read, but it should be very close.
> In the most common case, it will have all of the changes, and for a small
> percentage of requests, it may be a few hundred milliseconds to a few
> seconds behind."
>
> Which is somewhat different than an avg of 3000ms and up to 10000ms. The
> figures from the blog post suggests to me that a query with the
> EVENTUALLY_CONSISTENT flag is basically not-quite-transactional, where as
> the latest figures of 3-10mins sounds more like stale results. Is this a
> policy change?
>
> Regardless, it would be good to have these figures directly in the docs to
> help folks decide when they can use this feature and to decide between
> hi-rep and m/s.
>
> Couple of other questions related to this:
>
> - What percentage of reads block due to unavailability of a primary? That
> is, how often is setting the EVENTUALLY_CONSISTENT flag likely to make any
> difference at all?
>
> - For reads which do block, what is the average wait time for successful
> waits and the failure rate for reads which timeout (without an explicitly
> set deadline)?
>
> - And, how is unavailability of a primary determined? Is it a time out, and
> if so, how long? (I would use this figure to help determine a suitable
> deadline for queries which I want to fail over with eventual consistency.)
>
> Thanks.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to