We're rewriting the db module in ndb. Wrapping a get around keys only queries does not guarantee up to date information. The indexes may not be up to date. When you do a get on the keys themselves, those gets should be up to date, but because the indexes might be stale, it's possible you get back bad data. Suppose:
1. You write a Person with a name "ikai" 2. Ikai changes his name to "superman" 3. You query for everyone whose name is "ikai" 4. You get back a Person whose name is "superman" (get by key is transactional and always returns the latest data) -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Wed, Sep 7, 2011 at 5:28 PM, Joshua Smith <[email protected]>wrote: > Continuing the dialog with myself :) > > I've added this method to one of my classes that extends db.Model() and it > is working well with the dev appserver in --high_replication mode: > > @classmethod > def gql_with_get(cls, query_string, *args, **kwds): > return db.get(db.GqlQuery('SELECT __key__ FROM %s %s' % (cls.kind(), > query_string), *args, **kwds)) > > You use it just like gql().fetch(). For example: > > boards = BoardModel.gql_with_get("WHERE towns = :1 ORDER BY name", tid) > > It doesn't fix the index (things might be out of order, for instance), but > otherwise, it cures the problem of seeing stale data in HR. > > On Sep 7, 2011, at 12:22 PM, Joshua Smith wrote: > > > Another thought: The reason I was doing only one meeting per request was > because of the old 30 second limit on crons. But cron handlers can be 10 > minutes now, which is plenty of time to schedule all the meetings. > Therefore, I suppose I could do this, right? > > > > now = datetime.datetime.now() > > for schedule in db.get(db.gql("SELECT __key__ FROM ScheduleModel WHERE > next != :1 AND next < :2", None, now)): > > if schedule.next and schedule.next < now: > > schedule.cronAuto() > > > > Is wrapping a GET around a KEYS-ONLY query guaranteed to get me the > real-deal results (except, of course, for the fact that the index might be > out-of-date, so I might miss recent changes to who is in/out of the query > parameters)? Is this an efficient way to express this, or should I be doing > a fetch() on the gql first? > > > > It seems like it's possible to use a technique like this to get a > more-consistent result in cases where that's desirable. It at least would > get you a consistent data for a subset of things matching your query. In > principle, you could even re-sort the results if there is an ORDER clause. > Seems like this would be something useful in the db API... > > > > -Joshua > > > > On Sep 7, 2011, at 11:18 AM, Joshua Smith wrote: > > > >> > >> I'm trying to port my existing M/S app to HR because I have a gun to my > head with "Threaded Python Only for HR Apps" written on the bullets. > >> > >> My system will schedule meetings automatically. Scheduling a meeting > can take some time, because a bunch of records are created, and a bunch of > emails need to go out. So the code to schedule one looked like this: > >> > >> class MeetingAutoHandler(webapp.RequestHandler): > >> def get(self): > >> schedule = ScheduleModel.gql("WHERE next != :1 AND next < :2", None, > datetime.datetime.now()).get() > >> if schedule: > >> schedule.cronAuto() > >> taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=1) > >> > >> The query looks for a schedule object that needs a meeting to to be > scheduled now. There might be a few of these when the cron runs. So it > does the hard work for one of them (in cronAuto()), and schedules another > call to itself to get the next one using the task queue. > >> > >> This isn't going to work in HR because that query is going to keep > finding the same meeting. I could trivially tweak this by setting the > countdown=60, but I've yet to hear any of our google overlords commit to a > maximum value of when "eventually" happens in "eventually consistent". I > presume there might be cases, like during data center transitions, when > "eventually" could be a very long time indeed. It is essentially unbounded. > Right? > >> > >> But I like the pattern I'm using here, and I'm trying to change as > little code as possible, so I want to put together a HR-resilient version. > Here's what I came up with: > >> > >> class MeetingAutoHandler(webapp.RequestHandler): > >> def get(self): > >> now = datetime.datetime.now() > >> for s in db.gql("SELECT __key__ FROM ScheduleModel WHERE next != :1 AND > next < :2", None, now): > >> schedule = db.get(s) > >> if schedule.next and schedule.next < now: > >> schedule.cronAuto() > >> taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=5) > >> return > >> > >> So I'm doing a keys-only query and then doing a get() on the key. (I've > never done a keys-only GQL query before, but I think I got it right. Note > to google: There should be an option to Model.gql() to do keys-only > queries!) > >> > >> The way I understand HR, that get is going to get the real Model, which > might not meet the criteria in the gql, because the index might be out of > date. Right? > >> > >> So I check that the model meets the criteria that I just specified. > (Note to google: It'd be cool if there was a way to test an object against > a query, so I don't have to write the same code twice!) > >> > >> Finally, I pushed the next task out a bit, to make it less likely that > I'll have to look at the same objects over and over. > >> > >> So what do you think? Any suggestions? (I have a couple things that > work this way, so I want to choose a good design pattern to apply to each of > them.) > >> > >> The complexity would be lessened if I could to this: > >> > >> class MeetingAutoHandler(webapp.RequestHandler): > >> def get(self): > >> q = ScheduleModel.gql_keys_only("WHERE next != :1 AND next < :2", None, > datetime.datetime.now()) > >> for s in q: > >> schedule = db.get(s) > >> if q.matches(schedule): > >> schedule.cronAuto() > >> taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=5) > >> return > >> > >> This would require two changes: the db.Model would need to support > gql_keys_only (that's probably trivial); GqlQuery would need a matches() > method (that's probably not trivial). > >> > >> It's still a few more lines, but the complexity is about the same as the > old one. > >> > >> Worth the trouble of a couple feature request issues? > >> > >> -Joshua > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "Google App Engine" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > [email protected]. > >> For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > >> > > > > -- > > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > [email protected]. > > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
