Oh, I thought of one more detail.. I've been using a daily cron job for the past 3 months.. and it needs to work since if it doesn't my code is broken for the day..
So, the main two tricks I've learned are this. 1. Put as little functionality in the Cron part (for me this is the get.. and in the get it checks for the Cron headers etc) of the handler as possible. All my code does for the cron is add a single task to the queue that will end up doing the important work. 2. Use unique names for the task that gets added by the cron, and then have one or two backup Crons (spread apart ahead of time in case there is some transient issue) that just attempt to create the same named task. Then, you just make sure to use unique names for the next batches of tasks that are created... and have the taskqueu.add() part check for taskalreadyexists exceptions and pass on those exceptions (instead of retrying the taskadd). For me.. Cron jobs that were merely adding one task to do the important work were successful 99.9% of the time, but that 1 failed time was no good. So I needed the backup Crons (one runs 30 minutes before the task its adding should execute, the next runs 15 minutes before and the last one runs 5 minutes before). This will probably require some significant rewriting of your code... but just make sure you're using version control (Mercurial with code.google.comworks great for me) and you should be able to easily keep track of the changes you're making. On Fri, Feb 26, 2010 at 4:19 PM, Marc Provost <[email protected]> wrote: > Thanks for your quick reply! > > I parse several external sources (around 8) and from each source I > need to update the same 1000 entities (most of them already existing, > creation of new entities is rare). For each data source, I schedule a > cron job which spawns 1000 tasks (with attached data) and each of them > will update a single entity. I found by trial and error that app > engine was behaving better the shorter the tasks. So, when I say very > much parallelized, I mean spawning as many tasks as I can for each > cron job, each of them as small as possible. Since I have more > independent tasks running in parallel, my cron jobs execute faster. In > addition, I schedule my cron jobs apart so that they don't overlap, > but this should not matter, as I use the same queue which is limited > at 5 tasks per second. > > So, in summary, I have 8 cron jobs and each cron job spawns 1000 > tasks. A given cron job and its children tasks terminates in 3-4 > minutes at most. The cron jobs are separated so that 2 crons jobs > never execute together. > > Marc > > > > > > On Feb 26, 3:47 pm, Eli Jones <[email protected]> wrote: > > How many is "a bunch"? Also, You say "they are all very much > parallelized" > > but then you say that you've scheduled them 10 minutes apart and they > don't > > overlap.. those two statements are contradictory, please explain more > > clearly your cron-taskqueu setup and how it works and what exactly it is > > doing. > > > > When you say that the cron jobs "spawn tasks that write to one entity > > each".. what do you mean? The cron job is there to fire of the initial > > task.. and that task runs once, putting one entity and that's it? > > > > If so, why are you having these tasks only put one entity at a time.. > > instead of creating multiple entities and putting them in batches? Does > > each task put() new entities? or are they sometimes putting an entity > that > > may already exist? > > > > More info is more better for help. > > > > On Fri, Feb 26, 2010 at 3:35 PM, Marc Provost <[email protected]> > wrote: > > > Ok, here's my situation: > > > > > * I use the java implementation and my app id is poolfana. > > > * I have a bunch of cron jobs scheduled at night (Eastern Time) > > > * They are all very much parallelized. I am being very strict: they > > > spawn tasks that only write to one entity each. Each tasks will > > > execute in a few hundred ms. > > > * A given cron job and its spawned tasks will terminate in a few > > > minutes at most. > > > * I have scheduled each cron job at least 10 minutes apart, so they do > > > not overlap. > > > * In my dashboard, my max request per second is 3. The max limit is > > > supposed to be 30. > > > * My problem? The cron jobs fail sporadically (marked as "failed" in > > > the dashboard) with this error: > > > > > "Request was aborted after waiting too long to attempt to service your > > > request. Most likely, this indicates that you have reached your > > > simultaneous dynamic request limit. This is almost always due to > > > excessively high latency in your app. Please see > > >http://code.google.com/appengine/docs/quotas.htmlfor more details." > > > > > There is an issue for this problem: > > >http://code.google.com/p/googleappengine/issues/detail?id=2396 > > > > > It was starred 50+ times, but it was not acknowledged yet by the > > > google team. I'm writing this post to discuss potential workarounds, > > > potential misuses of the API with the google team or other people that > > > might have solved this problem. What else can I do? Is it a problem on > > > the google side or I'm I doing something wrong? Right now, I need to > > > re-execute the cron jobs manually everyday... > > > > > Thank you! > > > Marc > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "Google App Engine" group. > > > To post to this group, send email to [email protected] > . > > > To unsubscribe from this group, send email to > > > [email protected]<google-appengine%[email protected]> > <google-appengine%[email protected]<google-appengine%[email protected]> > > > > > . > > > For more options, visit this group at > > >http://groups.google.com/group/google-appengine?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<google-appengine%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
