Re: [google-appengine] Another HR Refactoring Issue - When to clear a cache?

Joshua Smith Tue, 06 Sep 2011 10:20:03 -0700

But with what countdown?

Is there an upper bound on "eventually" in eventually consistent?  How do you 
know that it's safe to do a query?


Can some people who have been using HR for a while weigh in on the longest 
delay they've ever seen to consistency to be achieved?

On Sep 6, 2011, at 1:13 PM, johnP wrote:

> 
> You can transactionally launch a deferred task with a future ETA that
> will clear the Memcache.
> 
> johnP
> 
> 
> On Sep 6, 9:55 am, Joshua Smith <[email protected]> wrote:
>> And you and your users are happy with this, Tom?
>> 
>> I've been thinking more about entity groups, since they make this 
>> consistency mess go away.  In fact, while Entity Groups are not a great 
>> solution for all my data, they do seem to fit for most of my models.  
>> Meetings can be parented by Boards, Meeting Changes can be parented by 
>> Meetings, etc..  So it's really only the top-level (Town) view where the 
>> consistency will be all over the place, and doing what you suggested for 
>> that seems like it could work.
>> 
>> However, I'm now realizing that adding entity groups to an existing 
>> application (even during the transition to HR) is a real problem.
>> 
>> - I won't be able to use the migration tools, since they are going to create 
>> the new entities matching the old ones, and you can't change a parent once 
>> the model is created.
>> 
>> - The trick of making a new key from the old key (to compensate for the app 
>> ID changing) won't work, because the parent paths will be wrong.
>> 
>> So if I wanted to use entity groups, the best I can come up with is:
>> 
>> - Do the migration from M/S to HR completely manually, with lots of 
>> receiving-side logic to build up entity groups where none existed before
>> 
>> - Create a table mapping old app keys to new app keys, so that when I get a 
>> request containing a key with the old app ID, I can look it up in the table 
>> to find out the new key (which includes a parent path)
>> 
>> Also, there are a few places where I use IDs in URLs instead of keys (like 
>> in an URL that might be sent by email, to keep it short).  By my reading, 
>> that won't work, because I can't call Model.get_by_id() unless I know the 
>> parent already. So once you go to entity groups, id's are pretty much 
>> useless.  (Is that right?)  I guess I could create id-like things that 
>> combine the IDs of all the things up the chain of parenting 
>> (1234_5678_9012).  The long strings of digits might still be nicer than full 
>> keys.
>> 
>> I'm feeling like I have a choice between a non-entity-group solution with a 
>> ton of problems cause by consistency, or an entity-group solution with a ton 
>> of problems cause by migration (for those models where entity groups are 
>> feasible at all).  Lesser of two evils design choices.  Lovely.
>> 
>> I'm really starting to hate HR.
>> 
>> -Joshua
>> 
>> On Sep 5, 2011, at 4:13 PM, Tom Phillips wrote:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> What I try to do in these cases is a combination of:
>> 
>>> - when a new xxxxx is added, say "Your xxxxx has been added
>>> successfully. It may take a few minutes to appear in the xxxxx list"
>>> - In the xxxxx listing page, only show the most basic info for each
>>> (Ideally a name that doesn't change, but see next bullet). So a
>>> vanilla query accross entity groups is fine here. Some users may see
>>> the new xxxxx sooner than others (no biggie, matter of seconds
>>> usually)
>>> - If the name _can_ get changed, say something to the changer like
>>> "The new name for your xxxxx may take a few minutes to appear in the
>>> xxxxx list"
>>> - Clicking on the "details" of a xxxxx does a get-by-key, ensuring the
>>> latest
>> 
>>> And If you're going to cache the list page, have the cache maintained
>>> by a cron job, to ensure there are no collisions. Cron job needs to be
>>> a bit smart, since in theory it could also see a new xxxxx one run,
>>> then not see it on a query a minute later. But at least there is a
>>> single place to deal with that with no contention on the cache.
>> 
>>> /Tom
>> 
>>> On Sep 5, 1:58 pm, Joshua Smith <[email protected]> wrote:
>>>> Yeah, that'll work, too.  Unless there are two changes to the list of 
>>>> boards at about the same time.  Then it will give a bizarre result.
>> 
>>>> The general idea of augmenting queries with extra results that you suspect 
>>>> it might not get seems to come up a lot as a "solution" to the eventual 
>>>> consistency problem.  Other than entity groups (which doesn't seem to fit 
>>>> any of my real-world data models), this is the only suggestion in the docs.
>> 
>>>> However, it really kinda sucks.  Since query results are usually sorted in 
>>>> some way, you need to re-sort to get the fixed record into the right 
>>>> place.  But, of course, you can't do that if you are using the query as an 
>>>> iterator.  And, of course, the change may have made that record no longer 
>>>> fit the query criteria, or it may have caused it to fit where it didn't 
>>>> before, so you need to deal with all three cases: remove the record from 
>>>> the list of results, add the record to the list of results, and replace 
>>>> the record in a list of results.  And then there is the question of where 
>>>> you put the record.  Putting it into a cookie or query parameter is weird, 
>>>> and fragile.  Putting it into memcache is fragile.
>> 
>>>> I suppose it's possible that there just is no good design pattern to use 
>>>> here.  In particular, I'm having trouble coming up with a way that doesn't 
>>>> require me to wrap 3 old lines of code with 20 lines of crap, then 
>>>> repeating that in 50 different places.  I don't mind having a bunch of 
>>>> crap if there is some way to reuse it.  But I'm struggling with that.
>> 
>>>> Is ANYBODY out there happy with the way they solved this problem?
>> 
>>>> -Joshua
>> 
>>>> On Sep 4, 2011, at 5:51 PM, Tom Phillips wrote:
>> 
>>>>> "I clear the cache whenever the list of boards changes in some way"
>> 
>>>>> How about update the cache at that point instead of clearing it?
>> 
>>>>> Need be you could even generate the HTML for the cache update with a
>>>>> URLFetch to  your UI handler where you include the added/changed board
>>>>> key(s) as parameters, so they can be gotten with strong consistency on
>>>>> that request and merge into the query result.
>> 
>>>>> /Tom
>> 
>>>>> On Sep 4, 11:16 am, Joshua Smith <[email protected]> wrote:
>>>>>> My monkeypatching solution (see my recent post in the -python group), 
>>>>>> which Guido says I shouldn't use, but which is just so darned pretty I 
>>>>>> can't help it, has gotten me through the first challenge of switching to 
>>>>>> HR, which is dealing with google search results containing keys into my 
>>>>>> old app's data store.
>> 
>>>>>> So now I'm looking at the big Kahuna problem of consistency.  Here's my 
>>>>>> first messy challenge there:
>> 
>>>>>> My app puts a list of boards on the home page for a town, along with the 
>>>>>> list of meetings.  Generating that list of boards was taking a lot of 
>>>>>> CPU, but they hardly ever change, so I put in a memcache system that 
>>>>>> built the HTML when it wasn't in the cache, and then cached it before 
>>>>>> serving.  I clear the cache whenever the list of boards changes in some 
>>>>>> way.
>> 
>>>>>> Well that ain't gonna work in HR.  It's quite possible that I update a 
>>>>>> board, clear the cache, and someone comes and hits that page before 
>>>>>> "eventually consistent" comes to pass.  So now I've got a cached copy of 
>>>>>> the stale data.
>> 
>>>>>> (Note that I cannot use entity groups to solve this because some boards 
>>>>>> are municipal agencies, and therefore cannot be parented to the town 
>>>>>> that is building its list.  I could parent all boards to some global 
>>>>>> parent, but, well, yuck.)
>> 
>>>>>> I have some different ideas about how to fix this, but I'm wondering if 
>>>>>> anyone else who's done the port to HR has come up with a solution they 
>>>>>> find particularly elegant?  I assume this is a pretty common problem, so 
>>>>>> there must be a design pattern out there… somewhere.
>> 
>>>>>> Here are my ideas:
>> 
>>>>>> - Clear the cache with a periodic task that re-clears it several times.  
>>>>>> I'm thinking a recurring geometric retry would be prudent (1, 2, 4, 8, 
>>>>>> 16, 32, 64, 128, 256, 512 seconds, and then pray that we have 
>>>>>> consistency)
>> 
>>>>>> - Checksum the modified or new board, and put that sum into memcache.  
>>>>>> When generating the new board, confirm that any checksums are good.  
>>>>>> This seems more deterministic, except I don't trust memache not to 
>>>>>> squelch the checksum record.  So perhaps I should do something in the 
>>>>>> datastore.  This feels like it's be about 10x as much code as the stupid 
>>>>>> geometric flush.
>> 
>>>>>> Any suggestions?
>> 
>>>>>> -Joshua
>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "Google App Engine" group.
>>>>> To post to this group, send email to [email protected].
>>>>> To unsubscribe from this group, send email to 
>>>>> [email protected].
>>>>> For more options, visit this group 
>>>>> athttp://groups.google.com/group/google-appengine?hl=en.
>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group 
>>> athttp://groups.google.com/group/google-appengine?hl=en.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Another HR Refactoring Issue - When to clear a cache?

Reply via email to