The short version boiled down to two questions:
1) What is the maximum realistic throughput for memcache increment()
on a single counter?
2) The documentation for the HRD says that queries across entity
groups are eventually consistent. Does this extend to __key__ queries
as well? For example, will this be eventually or strongly consistent?
SELECT * FROM Thing WHERE __key__ >= KEY('Thing', 10) AND __key__ <=
KEY('Thing', 20)
-----
The long version, to hopefully open a discussion:
I'm selling various types of virtual goods, each of which have a fixed
number available. The constraints for each product are:
* I can sell up to exactly N items
* I cannot under any circumstances sell N+1
* For some products, N might be hundreds of thousands
* The peak purchase rate is expected to be hundreds per second when a
highly anticipated product goes on sale. Maybe thousands per second.
I have thought of several strategies for building this system, but the
best one requires a counter that can:
1) Be updated hundreds of times per second (thousands?)
2) Give me a transactionally safe count, or at least temporarily err
on the side of overcounting (not undercounting!)
Between memcache and the datastore, I can build this:
* Memcache increment() gives me a current atomic count of purchases.
* Sharded counters give me a scalable persistent record of purchases
(the purchase record can be parented by the count shard and updated in
the same transaction)
* Memcache putIfUntouched() lets me safely initialize the memcache
value as long as I can get a strongly consistent count from the shards
* By ordering memcache increments *before* shard increments for
purchases, and shard decrements *before* memcache decrements for
refunds, I can guarantee that datastore failures will at worst cause
overcounts, which is fine. A timeout on the memcache value means it
will eventually become accurate again.
First question: What is the practical throughput of
memcache.increment() on a single counter? Thousands per second?
Second: The key is being able to get a strongly consistent sum of the
sharded counters. The example of a sharded counter in the appengine
documentation uses a query to fetch the shards, which will only work
on the M/S datastore.
Since get() is strongly consistent, the obvious solution is to give
the shards predictable names and use batch get() for XYZ-shard1,
XYZ-shard2, ... XYZ-shard100. Ok, but it's not quite as nice as being
able to say query for __key__ >= XYZ-shard1 and __key__ <=
XZY-shard100 because some of the shards might not exist... or I might
want to decrease the number of shards for a running system. Should I
just not care about this case?
I guess I've probably answered question #2... but if anyone else has
any suggestions for how to build a system like this, I'd love to hear
your thoughts.
Thanks,
Jeff
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.