Hi Peter,
Thanks first for re-stating the problem with more precise language!
As for your solution, sharding into many different ranking groups is
an interesting approach. One could imagine maintaining a tree of
ranking groups; searching/inserting/querying would require log N reads/
writes. All of it would have to be wrapped up in transactions, which
makes me a bit queasy. Not sure I want to venture down that path when
a cheap server running MySQL can handle the problem so nicely.
I have also been evaluating CouchDB as a next-gen database solution.
Hopefully Google is tracking their progress as well - as the world
leaders in map-reduce implementations, one would expect datastore to
boast features similar to how CouchDB handles its views. To do a
histogram in CouchDB,
map: function(doc) { emit(score, 1); }
reduce: function(keys, values) { return sum(values) }
CouchDB efficiently diffs this view index with every incoming write,
so finding a rank is a quick query.
Ben
On Oct 25, 5:27 pm, Peter Recore <[EMAIL PROTECTED]> wrote:
> I'll try to state a more formal version of the problem, then offer my
> suggestion.
>
> Problem:
>
> Assume that there is an entity type called Player, and each Player has
> a score. A score is an integer. For any given Player P with a score
> of X, we want to know how many other Players have scores that are
> greater than or equal to X. This is their Rank.
>
> Let's say we have the following Players and scores:
>
> Peter 30
> Ben 27
> Sylvain 42
> Marzia 50
> Ryan 12
>
> Peter's Rank would be 3 because Marzia, Sylvain and Peter all have
> scores greater than or equal to Peter's score of 30.
> Marzia's Rank would be 1, and Ryan's would be 5. The tricky part is
> that Ben needs to do this efficiently for hundreds of thousands of
> items ( with a fairly frequent number of updates i assume)
>
> Getting this kind of answer is relatively simple on a SQL database,
> though I don't honestly know how scalable the typical count(*) > X
> solutions are. With the proper indexes, they could probably be very
> fast, but i don't know if you'd get lots of contention on writing to
> that table/index.
>
> This does seem like a problem that any game with the concept of 'high
> score' built in app engine would need to solve. (actually, this
> problem probably generalizes even further) I'm pretty sure I will
> need something like this at some point myself. So I might as well
> take a stab at it now. Here are my initial thoughts... Encouraging or
> disparaging comments welcome.
>
> To solve this problem with the GAE datastore, I propose building a
> ShardedRanker.
> The lowest level item we need to consider is a RankedItem. A
> RankedItem has a key, a value, and a reference to a RankingGroup. The
> key would be a player's id, and the value would be a player's score.
>
> A ShardedRanker would have references to a number of RankingGroups.
> Each RankingGroup would contain a list of RankedItems, and be
> responsible for knowing the relative rank of each item it contains.
> Each RankingGroup would also be responsible for knowing the highest
> and lowest scores its items contain.
>
> When we want to know the rank of a particular item, we can get its
> overall rank by combining its relative rank within its RankingGroup
> with the total number of items contained in higher ranked Groups.
> This will require 1 datastore read to get the info from the
> RankingGroup, plus one read for each of the RankingGroups ranked
> higher than the 'target' group.
>
> When we insert a new RankedItem, we will need to update the
> RankingGroup it belongs to, and maybe the index which tells us which
> Group contains which ranges of scores. At some point, if a
> RankingGroup gets to big, it would probably be split.
>
> There should be lots of opportunity to memcache things here, reducing
> the need for datastore reads. As the total number of items increases,
> it might be necessary to create multiple levels of RankingGroups.
>
> -peter
>
> On Oct 24, 5:49 pm, yejun <[EMAIL PROTECTED]> wrote:
>
> > His problem is a count needed for every single user. So a single
> > change will cause a massive update on all count.
> > On a traditional database it is the position on the index.
>
> > On Oct 24, 5:33 pm, Sylvain <[EMAIL PROTECTED]> wrote:
>
> > > Sorry, I don't understand exactly the problem but if you want to have
> > > this SQL
>
> > > select count(*) from entries where points > 100
>
> > > You can create many "sharded counter"
> > > and from the code use this :
>
> > > increment("point_%i" % point)
>
> > > Then, if you want to count, just : get_count("point_%i" % point)
>
> > > It will create a lot of of "counter" but it will works fine.
>
> > > Then if you want to count for points > 100
> > > You will have to do many any times : get_count("point_%i" % point) for
> > > point from x to y
> > > else you can create steps : a counter from 100 to 150 , etc,...
>
> > > I think the only solution to do count(*) (if count(*) > 1000) is to
> > > use sharded counter
>
> > > if count(*) < 1000 then you can use the "count" keyword.
>
> > > Regards.
>
> > > On 24 oct, 21:39, Ben Nevile <[EMAIL PROTECTED]> wrote:
>
> > > > Hi Sylvain - thanks for the pointer, and although I now know a cool
> > > > sharding hack for a shared resource, I don't see how the technique can
> > > > be applied to my problem. To summarize, I could have a model indexed
> > > > by a single entity called "points", and I want to be able to tell
> > > > someone how they rank in terms of points. For someone with say 100
> > > > points, in SQL I would do something like
>
> > > > select count(*) from entries where points > 100
>
> > > > Ben
>
> > > > On Oct 24, 9:51 am, Sylvain <[EMAIL PROTECTED]> wrote:
>
> > > > > Check this
> > > > > :http://groups.google.com/group/google-appengine/browse_frm/thread/e13...
>
> > > > > and code here :http://paste.blixt.org/1581
>
> > > > > On 24 oct, 08:52, Ben Nevile <[EMAIL PROTECTED]> wrote:
>
> > > > > > Hi,
>
> > > > > > I have recently started using GAE for components of a large sports-
> > > > > > related Facebook app. One of the contests has hundreds of thousands
> > > > > > of participants, and I need to be able to tell a user at any given
> > > > > > time what place they're in. eg, you are 34,728th out of 234,829
> > > > > > participants.
>
> > > > > > After spending some time with the docs and browsing this group, it
> > > > > > appears that using the datastore there's no way to accomplish this
> > > > > > relatively mundane task. So do I have to keep a mysql database
> > > > > > active
> > > > > > on some other host just so that I can efficiently do this type of
> > > > > > analysis?
>
> > > > > > Ben
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---