Re: [google-appengine] More HR Refactoring

Joshua Smith Thu, 08 Sep 2011 10:50:01 -0700

I hadn't, primarily because of the non-determistic nature of relying on a cache 
I don't control flushes on.  It feels like I'm trading one consistency issue 
for another.


In analyzing my app, it seems that other than the one caching thing I already 
discussed, the only places where users are likely to see data consistency 
quirks is in the admin panels, where they often have the pattern [ Pick from 
List ] [ Edit ] [ Save ] [ See List Again].  And in this case, the trivial 
change to do a consistent GET over a GQL for keys seems to solve the issue.  So 
I think I'm all set.

Once I get through this, I plan to write up all my little recipes and 
lessons-learned, because there isn't much out there for people taking on M/S to 
HR refactoring.

On Sep 8, 2011, at 1:05 PM, Ikai Lan (Google) wrote:

> Joshua, have you considered memcache as a write-through cache? Memcache 
> should be strongly consistent with the caveat that when the data center 
> changes or any irregularity is detected, we flush it the cache.
> 
> --
> Ikai Lan 
> Developer Programs Engineer, Google App Engine
> plus.ikailan.com | twitter.com/ikai
> 
> 
> 
> On Thu, Sep 8, 2011 at 4:36 AM, Joshua Smith <[email protected]> wrote:
> In the dev appserver with --high_replication, if I do this:
> 
> 1. Create a blank entity
> 2. Edit that entity
> 3. put that entity
> 4. Generate a list of entities
> 
> That list, more often than not, shows me the blank entity.  The entity is in 
> the index, but it does not reflect the put in step #3.  This is consistent 
> with the "eventual consistency" model I've read about.
> 
> If, in step 4, I generate a list of all entity KEYS, and then db.get() those 
> entities, I never see the blank entity.  This is also consistent with the 
> documentation.
> 
> So, I believe that while my pattern requires an extra db roundtrip, it is 
> much less like to show information that will lead to a support call.
> 
> -Joshua
> 
> On Sep 8, 2011, at 12:47 AM, Robert Kluin wrote:
> 
> > This is done on the backend, if I remember correctly.  It doesn't gain
> > you anything.
> >
> >
> >
> > On Wed, Sep 7, 2011 at 19:28, Joshua Smith <[email protected]> wrote:
> >> Continuing the dialog with myself :)
> >>
> >> I've added this method to one of my classes that extends db.Model() and it 
> >> is working well with the dev appserver in --high_replication mode:
> >>
> >>  @classmethod
> >>  def gql_with_get(cls, query_string, *args, **kwds):
> >>   return db.get(db.GqlQuery('SELECT __key__ FROM %s %s' % (cls.kind(), 
> >> query_string), *args, **kwds))
> >>
> >> You use it just like gql().fetch().  For example:
> >>
> >>   boards = BoardModel.gql_with_get("WHERE towns = :1 ORDER BY name", tid)
> >>
> >> It doesn't fix the index (things might be out of order, for instance), but 
> >> otherwise, it cures the problem of seeing stale data in HR.
> >>
> >> On Sep 7, 2011, at 12:22 PM, Joshua Smith wrote:
> >>
> >>> Another thought: The reason I was doing only one meeting per request was 
> >>> because of the old 30 second limit on crons.  But cron handlers can be 10 
> >>> minutes now, which is plenty of time to schedule all the meetings.  
> >>> Therefore, I suppose I could do this, right?
> >>>
> >>>   now = datetime.datetime.now()
> >>>   for schedule in db.get(db.gql("SELECT __key__ FROM ScheduleModel WHERE 
> >>> next != :1 AND next < :2", None, now)):
> >>>     if schedule.next and schedule.next < now:
> >>>       schedule.cronAuto()
> >>>
> >>> Is wrapping a GET around a KEYS-ONLY query guaranteed to get me the 
> >>> real-deal results (except, of course, for the fact that the index might 
> >>> be out-of-date, so I might miss recent changes to who is in/out of the 
> >>> query parameters)?  Is this an efficient way to express this, or should I 
> >>> be doing a fetch() on the gql first?
> >>>
> >>> It seems like it's possible to use a technique like this to get a 
> >>> more-consistent result in cases where that's desirable.  It at least 
> >>> would get you a consistent data for a subset of things matching your 
> >>> query.  In principle, you could even re-sort the results if there is an 
> >>> ORDER clause.  Seems like this would be something useful in the db API...
> >>>
> >>> -Joshua
> >>>
> >>> On Sep 7, 2011, at 11:18 AM, Joshua Smith wrote:
> >>>
> >>>>
> >>>> I'm trying to port my existing M/S app to HR because I have a gun to my 
> >>>> head with "Threaded Python Only for HR Apps" written on the bullets.
> >>>>
> >>>> My system will schedule meetings automatically.  Scheduling a meeting 
> >>>> can take some time, because a bunch of records are created, and a bunch 
> >>>> of emails need to go out.  So the code to schedule one looked like this:
> >>>>
> >>>> class MeetingAutoHandler(webapp.RequestHandler):
> >>>> def get(self):
> >>>>  schedule = ScheduleModel.gql("WHERE next != :1 AND next < :2", None, 
> >>>> datetime.datetime.now()).get()
> >>>>  if schedule:
> >>>>    schedule.cronAuto()
> >>>>    taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=1)
> >>>>
> >>>> The query looks for a schedule object that needs a meeting to to be 
> >>>> scheduled now.  There might be a few of these when the cron runs.  So it 
> >>>> does the hard work for one of them (in cronAuto()), and schedules 
> >>>> another call to itself to get the next one using the task queue.
> >>>>
> >>>> This isn't going to work in HR because that query is going to keep 
> >>>> finding the same meeting.  I could trivially tweak this by setting the 
> >>>> countdown=60, but I've yet to hear any of our google overlords commit to 
> >>>> a maximum value of when "eventually" happens in "eventually consistent". 
> >>>>  I presume there might be cases, like during data center transitions, 
> >>>> when "eventually" could be a very long time indeed.  It is essentially 
> >>>> unbounded.  Right?
> >>>>
> >>>> But I like the pattern I'm using here, and I'm trying to change as 
> >>>> little code as possible, so I want to put together a HR-resilient 
> >>>> version.  Here's what I came up with:
> >>>>
> >>>> class MeetingAutoHandler(webapp.RequestHandler):
> >>>> def get(self):
> >>>>  now = datetime.datetime.now()
> >>>>  for s in db.gql("SELECT __key__ FROM ScheduleModel WHERE next != :1 AND 
> >>>> next < :2", None, now):
> >>>>    schedule = db.get(s)
> >>>>    if schedule.next and schedule.next < now:
> >>>>      schedule.cronAuto()
> >>>>      taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=5)
> >>>>      return
> >>>>
> >>>> So I'm doing a keys-only query and then doing a get() on the key.  (I've 
> >>>> never done a keys-only GQL query before, but I think I got it right.  
> >>>> Note to google: There should be an option to Model.gql() to do keys-only 
> >>>> queries!)
> >>>>
> >>>> The way I understand HR, that get is going to get the real Model, which 
> >>>> might not meet the criteria in the gql, because the index might be out 
> >>>> of date.  Right?
> >>>>
> >>>> So I check that the model meets the criteria that I just specified.  
> >>>> (Note to google: It'd be cool if there was a way to test an object 
> >>>> against a query, so I don't have to write the same code twice!)
> >>>>
> >>>> Finally, I pushed the next task out a bit, to make it less likely that 
> >>>> I'll have to look at the same objects over and over.
> >>>>
> >>>> So what do you think?  Any suggestions?  (I have a couple things that 
> >>>> work this way, so I want to choose a good design pattern to apply to 
> >>>> each of them.)
> >>>>
> >>>> The complexity would be lessened if I could to this:
> >>>>
> >>>> class MeetingAutoHandler(webapp.RequestHandler):
> >>>> def get(self):
> >>>>  q = ScheduleModel.gql_keys_only("WHERE next != :1 AND next < :2", None, 
> >>>>  datetime.datetime.now())
> >>>>  for s in  q:
> >>>>    schedule = db.get(s)
> >>>>    if q.matches(schedule):
> >>>>      schedule.cronAuto()
> >>>>      taskqueue.add(url='/admin/meetingAuto', method='GET', countdown=5)
> >>>>      return
> >>>>
> >>>> This would require two changes: the db.Model would need to support 
> >>>> gql_keys_only (that's probably trivial); GqlQuery would need a matches() 
> >>>> method (that's probably not trivial).
> >>>>
> >>>> It's still a few more lines, but the complexity is about the same as the 
> >>>> old one.
> >>>>
> >>>> Worth the trouble of a couple feature request issues?
> >>>>
> >>>> -Joshua
> >>>>
> >>>> --
> >>>> You received this message because you are subscribed to the Google 
> >>>> Groups "Google App Engine" group.
> >>>> To post to this group, send email to [email protected].
> >>>> To unsubscribe from this group, send email to 
> >>>> [email protected].
> >>>> For more options, visit this group at 
> >>>> http://groups.google.com/group/google-appengine?hl=en.
> >>>>
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google Groups 
> >>> "Google App Engine" group.
> >>> To post to this group, send email to [email protected].
> >>> To unsubscribe from this group, send email to 
> >>> [email protected].
> >>> For more options, visit this group at 
> >>> http://groups.google.com/group/google-appengine?hl=en.
> >>>
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups 
> >> "Google App Engine" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to 
> >> [email protected].
> >> For more options, visit this group at 
> >> http://groups.google.com/group/google-appengine?hl=en.
> >>
> >>
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group at 
> > http://groups.google.com/group/google-appengine?hl=en.
> >
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] More HR Refactoring

Reply via email to