Durham, any time you're facing this kind of problem remember to always include memcache in your solution because datastore operations are expensive.
I think the ideal arrangement is like this: 1. Memcache is where you query for data. Think of it as a "database that can die any time". Your app should hit this guy a lot. 2. Datastore is a "backup for re-creating the dead database". Basically you constrain datastore to fallback use only. How about this: 1. As users post new questions, save the question IDs in memcache, keeping at most 1000 of them (you decide what the limit should be). 2. When you need to present random questions, take those 1000 IDs, and do an IN query against the questions which the user has answered. The diff between the query result and those 1000 IDs are questions unanswered by the user. 3. You randomly pick 5 unanswered IDs and then map these IDs into entities. This mapping should use data from memcache as well. Objectify's caching is great for this (Java). Pros and cons of suggested solution: 1. Steps 1 and 3 skip datastore completely. Step 2 would fall into small datastore operations (I think) therefore cheaper. 2. This solution would only present random questions that are new-ish. Generally this isn't a big problem. On Jan 13, 2:35 pm, DurhamG <[email protected]> wrote: > I've been trying to figure this one out for a few hours now: > > I have an ever growing table of questions, and an ever growing table of > users. For a given user, I would like to query for 5 random questions that > they have not already answered. What kind of model schema would allow this? > > I've seen how to query for N (semi)random > entities<http://stackoverflow.com/questions/3002999/fetching-a-random-record-f...> > (which > uses a > query filter), and I've seen the presentation > <http://www.google.com/events/io/2009/sessions/BuildingScalableComplex...>on > how to do microblogging style schemas (which uses = on lists as a query > filter), which together might allow me to query for '5 random questions the > user *has* already answered'. To do the opposite though would require > > and != on different properties which isn't allowed. > > The best I can come up with so far is to keep a list of the answered > questions on the user entity, then query for batches of random questions > until I find 5 which aren't in their list. This assumes the number of > questions answered by a single user remains under 5000 (the list size limit > if I recall) and that there are more unanswered questions than answered > ones for any given user (so that I don't have to pull too many batches in > looking for questions). These limitations might be reasonable, but this > approach still seems less than optimal. > > Any ideas on how to accomplish this? > > Someone asked this question a couple years ago with no > response<http://groups.google.com/group/google-appengine/browse_thread/thread/...>, > so I'm hoping some changes have occurred in the mean time to make this > possible. > > Thanks! -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
