On Wed, Feb 1, 2012 at 2:48 AM, Richard Watson <[email protected]> wrote:
>
> It seems obvious that fewer shards allow you to query across the full answer
> set with the least amount of queries. E.g. if you'll often query across
> users for a single multi-user customer, best would be to have a prefix that
> is shared for that customer, rather than one user. That way you don't have
> to sew the result sets together. (Does using a namespace handle this
> automatically or does it only affect keys? Where are the 'built-in'
> shard-points? Accounts must be sharded, otherwise the tablet covering the
> "current" timestamp would explode.)

The "sharding" in GAE-land works a little differently from the way you think.

There's the notion of an Entity Group, which is probably closest to a
traditional data federation, but with a twist:  you typically create
zillions of tiny entity groups, say, one for each customer.  The
sharding is quite transparent; you only notice it when you write to
the same EG too fast or you try to run transactions across EGs.

The kind of sharding you would have to do to escape the hot tablet
problem is sharding the values of a particular field.  The index
tablets span all EGs.  So you might create 4 "versions" of the login
timestamp (say, prefixed with a different letter) and then issue four
queries when you want to query for the last 100 people that logged in.
 In this case, you just pick a random prefix every time you write the
field... there's no need to make it stable.

Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to