We're rewriting the db module in ndb.

Wrapping a get around keys only queries does not guarantee up to date
information. The indexes may not be up to date. When you do a get on the
keys themselves, those gets should be up to date, but because the indexes
might be stale, it's possible you get back bad data. Suppose:

1. You write a Person with a name "ikai"
2. Ikai changes his name to "superman"
3. You query for everyone whose name is "ikai"
4. You get back a Person whose name is "superman" (get by key is
transactional and always returns the latest data)

--
Ikai Lan
Developer Programs Engineer, Google App Engine
plus.ikailan.com | twitter.com/ikai



On Wed, Sep 7, 2011 at 5:28 PM, Joshua Smith <[email protected]>wrote:

> Continuing the dialog with myself :)
>
> I've added this method to one of my classes that extends db.Model() and it
> is working well with the dev appserver in --high_replication mode:
>
>  @classmethod
>  def gql_with_get(cls, query_string, *args, **kwds):
>   return db.get(db.GqlQuery('SELECT __key__ FROM %s %s' % (cls.kind(),
> query_string), *args, **kwds))
>
> You use it just like gql().fetch().  For example:
>
>   boards = BoardModel.gql_with_get("WHERE towns = :1 ORDER BY name", tid)
>
> It doesn't fix the index (things might be out of order, for instance), but
> otherwise, it cures the problem of seeing stale data in HR.
>
> On Sep 7, 2011, at 12:22 PM, Joshua Smith wrote:
>
> > Another thought: The reason I was doing only one meeting per request was
> because of the old 30 second limit on crons.  But cron handlers can be 10
> minutes now, which is plenty of time to schedule all the meetings.
>  Therefore, I suppose I could do this, right?
> >
> >   now = datetime.datetime.now()
> >   for schedule in db.get(db.gql("SELECT __key__ FROM ScheduleModel WHERE
> next != :1 AND next < :2", None, now)):
> >     if schedule.next and schedule.next < now:
> >       schedule.cronAuto()
> >
> > Is wrapping a GET around a KEYS-ONLY query guaranteed to get me the
> real-deal results (except, of course, for the fact that the index might be
> out-of-date, so I might miss recent changes to who is in/out of the query
> parameters)?  Is this an efficient way to express this, or should I be doing
> a fetch() on the gql first?
> >
> > It seems like it's possible to use a technique like this to get a
> more-consistent result in cases where that's desirable.  It at least would
> get you a consistent data for a subset of things matching your query.  In
> principle, you could even re-sort the results if there is an ORDER clause.
>  Seems like this would be something useful in the db API...
> >
> > -Joshua
> >
> > On Sep 7, 2011, at 11:18 AM, Joshua Smith wrote:
> >
> >>
> >> I'm trying to port my existing M/S app to HR because I have a gun to my
> head with "Threaded Python Only for HR Apps" written on the bullets.
> >>
> >> My system will schedule meetings automatically.  Scheduling a meeting
> can take some time, because a bunch of records are created, and a bunch of
> emails need to go out.  So the code to schedule one looked like this:
> >>
> >> class MeetingAutoHandler(webapp.RequestHandler):
> >> def get(self):
> >>  schedule = ScheduleModel.gql("WHERE next != :1 AND next < :2", None,
> datetime.datetime.now()).get()
> >>  if schedule:
> >>    schedule.cronAuto()
> >>    taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=1)
> >>
> >> The query looks for a schedule object that needs a meeting to to be
> scheduled now.  There might be a few of these when the cron runs.  So it
> does the hard work for one of them (in cronAuto()), and schedules another
> call to itself to get the next one using the task queue.
> >>
> >> This isn't going to work in HR because that query is going to keep
> finding the same meeting.  I could trivially tweak this by setting the
> countdown=60, but I've yet to hear any of our google overlords commit to a
> maximum value of when "eventually" happens in "eventually consistent".  I
> presume there might be cases, like during data center transitions, when
> "eventually" could be a very long time indeed.  It is essentially unbounded.
>  Right?
> >>
> >> But I like the pattern I'm using here, and I'm trying to change as
> little code as possible, so I want to put together a HR-resilient version.
>  Here's what I came up with:
> >>
> >> class MeetingAutoHandler(webapp.RequestHandler):
> >> def get(self):
> >>  now = datetime.datetime.now()
> >>  for s in db.gql("SELECT __key__ FROM ScheduleModel WHERE next != :1 AND
> next < :2", None, now):
> >>    schedule = db.get(s)
> >>    if schedule.next and schedule.next < now:
> >>      schedule.cronAuto()
> >>      taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=5)
> >>      return
> >>
> >> So I'm doing a keys-only query and then doing a get() on the key.  (I've
> never done a keys-only GQL query before, but I think I got it right.  Note
> to google: There should be an option to Model.gql() to do keys-only
> queries!)
> >>
> >> The way I understand HR, that get is going to get the real Model, which
> might not meet the criteria in the gql, because the index might be out of
> date.  Right?
> >>
> >> So I check that the model meets the criteria that I just specified.
>  (Note to google: It'd be cool if there was a way to test an object against
> a query, so I don't have to write the same code twice!)
> >>
> >> Finally, I pushed the next task out a bit, to make it less likely that
> I'll have to look at the same objects over and over.
> >>
> >> So what do you think?  Any suggestions?  (I have a couple things that
> work this way, so I want to choose a good design pattern to apply to each of
> them.)
> >>
> >> The complexity would be lessened if I could to this:
> >>
> >> class MeetingAutoHandler(webapp.RequestHandler):
> >> def get(self):
> >>  q = ScheduleModel.gql_keys_only("WHERE next != :1 AND next < :2", None,
>  datetime.datetime.now())
> >>  for s in  q:
> >>    schedule = db.get(s)
> >>    if q.matches(schedule):
> >>      schedule.cronAuto()
> >>      taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=5)
> >>      return
> >>
> >> This would require two changes: the db.Model would need to support
> gql_keys_only (that's probably trivial); GqlQuery would need a matches()
> method (that's probably not trivial).
> >>
> >> It's still a few more lines, but the complexity is about the same as the
> old one.
> >>
> >> Worth the trouble of a couple feature request issues?
> >>
> >> -Joshua
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups "Google App Engine" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> [email protected].
> >> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
> >>
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to