On Wed, Feb 1, 2012 at 2:48 AM, Richard Watson <[email protected]> wrote: > > It seems obvious that fewer shards allow you to query across the full answer > set with the least amount of queries. E.g. if you'll often query across > users for a single multi-user customer, best would be to have a prefix that > is shared for that customer, rather than one user. That way you don't have > to sew the result sets together. (Does using a namespace handle this > automatically or does it only affect keys? Where are the 'built-in' > shard-points? Accounts must be sharded, otherwise the tablet covering the > "current" timestamp would explode.)
The "sharding" in GAE-land works a little differently from the way you think. There's the notion of an Entity Group, which is probably closest to a traditional data federation, but with a twist: you typically create zillions of tiny entity groups, say, one for each customer. The sharding is quite transparent; you only notice it when you write to the same EG too fast or you try to run transactions across EGs. The kind of sharding you would have to do to escape the hot tablet problem is sharding the values of a particular field. The index tablets span all EGs. So you might create 4 "versions" of the login timestamp (say, prefixed with a different letter) and then issue four queries when you want to query for the last 100 people that logged in. In this case, you just pick a random prefix every time you write the field... there's no need to make it stable. Jeff -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
