[google-appengine] Re: High-performance, high-precision sharded counters in the HRD

Kaan Soral Mon, 22 Aug 2011 12:50:46 -0700

Out of curiosity, are you in the development stage?

I did a lot of unnecessary thinking like this when I was in the
development stage and after the production, I saw that half of my work
was unnecessary


On Aug 20, 11:50 pm, Jeff Schnitzer <[email protected]> wrote:
> The short version boiled down to two questions:
>
> 1) What is the maximum realistic throughput for memcache increment()
> on a single counter?
>
> 2) The documentation for the HRD says that queries across entity
> groups are eventually consistent.  Does this extend to __key__ queries
> as well?  For example, will this be eventually or strongly consistent?
>
> SELECT * FROM Thing WHERE __key__ >= KEY('Thing', 10) AND __key__ <=
> KEY('Thing', 20)
>
> -----
>
> The long version, to hopefully open a discussion:
>
> I'm selling various types of virtual goods, each of which have a fixed
> number available.  The constraints for each product are:
>
>  * I can sell up to exactly N items
>  * I cannot under any circumstances sell N+1
>  * For some products, N might be hundreds of thousands
>  * The peak purchase rate is expected to be hundreds per second when a
> highly anticipated product goes on sale.  Maybe thousands per second.
>
> I have thought of several strategies for building this system, but the
> best one requires a counter that can:
>
> 1) Be updated hundreds of times per second (thousands?)
> 2) Give me a transactionally safe count, or at least temporarily err
> on the side of overcounting (not undercounting!)
>
> Between memcache and the datastore, I can build this:
>
>  * Memcache increment() gives me a current atomic count of purchases.
>  * Sharded counters give me a scalable persistent record of purchases
> (the purchase record can be parented by the count shard and updated in
> the same transaction)
>  * Memcache putIfUntouched() lets me safely initialize the memcache
> value as long as I can get a strongly consistent count from the shards
>  * By ordering memcache increments *before* shard increments for
> purchases, and shard decrements *before* memcache decrements for
> refunds, I can guarantee that datastore failures will at worst cause
> overcounts, which is fine.  A timeout on the memcache value means it
> will eventually become accurate again.
>
> First question:  What is the practical throughput of
> memcache.increment() on a single counter?  Thousands per second?
>
> Second:  The key is being able to get a strongly consistent sum of the
> sharded counters.  The example of a sharded counter in the appengine
> documentation uses a query to fetch the shards, which will only work
> on the M/S datastore.
>
> Since get() is strongly consistent, the obvious solution is to give
> the shards predictable names and use batch get() for XYZ-shard1,
> XYZ-shard2, ... XYZ-shard100.  Ok, but it's not quite as nice as being
> able to say query for __key__ >= XYZ-shard1 and __key__ <=
> XZY-shard100 because some of the shards might not exist... or I might
> want to decrease the number of shards for a running system.  Should I
> just not care about this case?
>
> I guess I've probably answered question #2... but if anyone else has
> any suggestions for how to build a system like this, I'd love to hear
> your thoughts.
>
> Thanks,
> Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: High-performance, high-precision sharded counters in the HRD

Reply via email to