Returning a random element is actually much trickier than it sounds.

First, the approach below does not provide a linear distribution.  You
are much better off assigning a one-up serial number to each entity
upon creation, then doing a retrieval the entity whose serial number
>= randint(1,max_count)  ... and cycling back to 0 if you get no hits.

I have seen other people espousing the "random/random" approach below,
but it is fundamentally flawed.  Using the approach below, imagine the
first element getting a random value of 700000, and the second
happening to get a random value of 700001.   The first element will
get far more "random" requests than the second.  This is obviously an
extreme example, but it exposes the underlying problem.  The more
elements you have, the less important this problem will seem, as your
initial random distribution covers more of the territory, but the
fundamental flaw will always be there, with some elements getting 2x
or even 100X the exposure of other elements.

Unfortunately, even the approach I first describe above does not
statistically work when you start putting other conditions into the
query.  The problem is that if there are any swaths of non-uniform
distribution in your DB, then entities on the boundaries of these
swaths will get more than their fair share of hits.

The best approach turns out to depend on your particular
circumstances, and unfortunately, there is no "correct" answer for
certain situations other than to read all satisfying entities into
memory, and then do a random selection from those ... which is usually
prohibitively expensive.  Or, you can give up on "perfect"
distribution and use an approximation.

I'd be delighted if someone could volunteer a better approach!

On Apr 1, 5:55 am, Barry Hunter <[email protected]> wrote:
> When saving a entity, include a random property. (eg use an Integer,
> and set it to a random value between 1 and 1000000)
>
> Then when you want a random record, use the equivlent to ths GQL
>
> SELECT * FROM RandomEntity WHERE RandomProperty > 4456 ORDER BY
> RandomProperty ASC LIMIT 1
>
> where the 4456 is randomly choosen.
>
> Or to account for occasional duplications of the RandomProperty could
> include a small offset. (say 1-10)
>
> SELECT * FROM RandomEntity WHERE RandomProperty > 4456 ORDER BY
> RandomProperty ASC LIMIT 5,1
>
> I believe that should be reasonably efficient in the datastore.
>
> On 01/04/2009, sagey <[email protected]> wrote:
>
>
>
> >  hello,
>
> >  I'd like to be able to return random results from the datastore. I'm
> >  not sure from the documentation how i would go about doing that. I'd
> >  appreciate some pointers.
>
> >  Thanks in advance
>
> --
> Barry
>
> -www.nearby.org.uk-www.geograph.org.uk-
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to