any idea why three of the posts at the top of the corkboard say "error contacting cajoling service"?
* The communication with the cajoling service is unreliable. I think we're running into App Engine HTTP client timeouts, but the errors don't indicate that for sure. It might help to switch to async requests so that all the cajoling for a given page occurs in parallel.
* The caching of cajoled content is unnecessarily ephemeral, so the above failure possibility occurs more than it has to. This would probably not be a problem if the site had more traffic keeping the entries in the cache; or, we could switch to keeping the cache in the database service instead of the memcache service.
Also, if a failure that isn't an explicit timeout occurs, then Caja Corkboard caches that failure for nominally 5 seconds (but in practice, longer than that); this was intended to prioritize fast page loads over retrying in the event that the failure really is a permanent failure.
Regarding doing something about this, I think the first thing to try is switching to async URL Fetch requests. If I recall the API properly, it should be possible to change cajole.py's interface so that we can say "I'm going to want the cajoled form of <some html>" for each posting and then later "I actually need it now, block if you haven't got it yet" -- which is no worse than promises WRT the CPU- heavy aspects.
I *may* have time to work on this in the upcoming weekend. Please let me know what priority you place on it (vs. canvas and Caja-CapTP work).
-- Kevin Reid <http://switchb.org/kpreid/>
