But with what countdown? Is there an upper bound on "eventually" in eventually consistent? How do you know that it's safe to do a query?
Can some people who have been using HR for a while weigh in on the longest delay they've ever seen to consistency to be achieved? On Sep 6, 2011, at 1:13 PM, johnP wrote: > > You can transactionally launch a deferred task with a future ETA that > will clear the Memcache. > > johnP > > > On Sep 6, 9:55 am, Joshua Smith <[email protected]> wrote: >> And you and your users are happy with this, Tom? >> >> I've been thinking more about entity groups, since they make this >> consistency mess go away. In fact, while Entity Groups are not a great >> solution for all my data, they do seem to fit for most of my models. >> Meetings can be parented by Boards, Meeting Changes can be parented by >> Meetings, etc.. So it's really only the top-level (Town) view where the >> consistency will be all over the place, and doing what you suggested for >> that seems like it could work. >> >> However, I'm now realizing that adding entity groups to an existing >> application (even during the transition to HR) is a real problem. >> >> - I won't be able to use the migration tools, since they are going to create >> the new entities matching the old ones, and you can't change a parent once >> the model is created. >> >> - The trick of making a new key from the old key (to compensate for the app >> ID changing) won't work, because the parent paths will be wrong. >> >> So if I wanted to use entity groups, the best I can come up with is: >> >> - Do the migration from M/S to HR completely manually, with lots of >> receiving-side logic to build up entity groups where none existed before >> >> - Create a table mapping old app keys to new app keys, so that when I get a >> request containing a key with the old app ID, I can look it up in the table >> to find out the new key (which includes a parent path) >> >> Also, there are a few places where I use IDs in URLs instead of keys (like >> in an URL that might be sent by email, to keep it short). By my reading, >> that won't work, because I can't call Model.get_by_id() unless I know the >> parent already. So once you go to entity groups, id's are pretty much >> useless. (Is that right?) I guess I could create id-like things that >> combine the IDs of all the things up the chain of parenting >> (1234_5678_9012). The long strings of digits might still be nicer than full >> keys. >> >> I'm feeling like I have a choice between a non-entity-group solution with a >> ton of problems cause by consistency, or an entity-group solution with a ton >> of problems cause by migration (for those models where entity groups are >> feasible at all). Lesser of two evils design choices. Lovely. >> >> I'm really starting to hate HR. >> >> -Joshua >> >> On Sep 5, 2011, at 4:13 PM, Tom Phillips wrote: >> >> >> >> >> >> >> >>> What I try to do in these cases is a combination of: >> >>> - when a new xxxxx is added, say "Your xxxxx has been added >>> successfully. It may take a few minutes to appear in the xxxxx list" >>> - In the xxxxx listing page, only show the most basic info for each >>> (Ideally a name that doesn't change, but see next bullet). So a >>> vanilla query accross entity groups is fine here. Some users may see >>> the new xxxxx sooner than others (no biggie, matter of seconds >>> usually) >>> - If the name _can_ get changed, say something to the changer like >>> "The new name for your xxxxx may take a few minutes to appear in the >>> xxxxx list" >>> - Clicking on the "details" of a xxxxx does a get-by-key, ensuring the >>> latest >> >>> And If you're going to cache the list page, have the cache maintained >>> by a cron job, to ensure there are no collisions. Cron job needs to be >>> a bit smart, since in theory it could also see a new xxxxx one run, >>> then not see it on a query a minute later. But at least there is a >>> single place to deal with that with no contention on the cache. >> >>> /Tom >> >>> On Sep 5, 1:58 pm, Joshua Smith <[email protected]> wrote: >>>> Yeah, that'll work, too. Unless there are two changes to the list of >>>> boards at about the same time. Then it will give a bizarre result. >> >>>> The general idea of augmenting queries with extra results that you suspect >>>> it might not get seems to come up a lot as a "solution" to the eventual >>>> consistency problem. Other than entity groups (which doesn't seem to fit >>>> any of my real-world data models), this is the only suggestion in the docs. >> >>>> However, it really kinda sucks. Since query results are usually sorted in >>>> some way, you need to re-sort to get the fixed record into the right >>>> place. But, of course, you can't do that if you are using the query as an >>>> iterator. And, of course, the change may have made that record no longer >>>> fit the query criteria, or it may have caused it to fit where it didn't >>>> before, so you need to deal with all three cases: remove the record from >>>> the list of results, add the record to the list of results, and replace >>>> the record in a list of results. And then there is the question of where >>>> you put the record. Putting it into a cookie or query parameter is weird, >>>> and fragile. Putting it into memcache is fragile. >> >>>> I suppose it's possible that there just is no good design pattern to use >>>> here. In particular, I'm having trouble coming up with a way that doesn't >>>> require me to wrap 3 old lines of code with 20 lines of crap, then >>>> repeating that in 50 different places. I don't mind having a bunch of >>>> crap if there is some way to reuse it. But I'm struggling with that. >> >>>> Is ANYBODY out there happy with the way they solved this problem? >> >>>> -Joshua >> >>>> On Sep 4, 2011, at 5:51 PM, Tom Phillips wrote: >> >>>>> "I clear the cache whenever the list of boards changes in some way" >> >>>>> How about update the cache at that point instead of clearing it? >> >>>>> Need be you could even generate the HTML for the cache update with a >>>>> URLFetch to your UI handler where you include the added/changed board >>>>> key(s) as parameters, so they can be gotten with strong consistency on >>>>> that request and merge into the query result. >> >>>>> /Tom >> >>>>> On Sep 4, 11:16 am, Joshua Smith <[email protected]> wrote: >>>>>> My monkeypatching solution (see my recent post in the -python group), >>>>>> which Guido says I shouldn't use, but which is just so darned pretty I >>>>>> can't help it, has gotten me through the first challenge of switching to >>>>>> HR, which is dealing with google search results containing keys into my >>>>>> old app's data store. >> >>>>>> So now I'm looking at the big Kahuna problem of consistency. Here's my >>>>>> first messy challenge there: >> >>>>>> My app puts a list of boards on the home page for a town, along with the >>>>>> list of meetings. Generating that list of boards was taking a lot of >>>>>> CPU, but they hardly ever change, so I put in a memcache system that >>>>>> built the HTML when it wasn't in the cache, and then cached it before >>>>>> serving. I clear the cache whenever the list of boards changes in some >>>>>> way. >> >>>>>> Well that ain't gonna work in HR. It's quite possible that I update a >>>>>> board, clear the cache, and someone comes and hits that page before >>>>>> "eventually consistent" comes to pass. So now I've got a cached copy of >>>>>> the stale data. >> >>>>>> (Note that I cannot use entity groups to solve this because some boards >>>>>> are municipal agencies, and therefore cannot be parented to the town >>>>>> that is building its list. I could parent all boards to some global >>>>>> parent, but, well, yuck.) >> >>>>>> I have some different ideas about how to fix this, but I'm wondering if >>>>>> anyone else who's done the port to HR has come up with a solution they >>>>>> find particularly elegant? I assume this is a pretty common problem, so >>>>>> there must be a design pattern out there⦠somewhere. >> >>>>>> Here are my ideas: >> >>>>>> - Clear the cache with a periodic task that re-clears it several times. >>>>>> I'm thinking a recurring geometric retry would be prudent (1, 2, 4, 8, >>>>>> 16, 32, 64, 128, 256, 512 seconds, and then pray that we have >>>>>> consistency) >> >>>>>> - Checksum the modified or new board, and put that sum into memcache. >>>>>> When generating the new board, confirm that any checksums are good. >>>>>> This seems more deterministic, except I don't trust memache not to >>>>>> squelch the checksum record. So perhaps I should do something in the >>>>>> datastore. This feels like it's be about 10x as much code as the stupid >>>>>> geometric flush. >> >>>>>> Any suggestions? >> >>>>>> -Joshua >> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Google App Engine" group. >>>>> To post to this group, send email to [email protected]. >>>>> To unsubscribe from this group, send email to >>>>> [email protected]. >>>>> For more options, visit this group >>>>> athttp://groups.google.com/group/google-appengine?hl=en. >> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Google App Engine" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group >>> athttp://groups.google.com/group/google-appengine?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
