Those of you who participated in the first few hours of the qualification round may have noticed some problems with the site: a high percentage of the time, people who tried to download an input were told that there had been a server error, and that they should try again. First and foremost we apologize for this; we'd also like to explain what happened.
Summary Google Code Jam runs as an app like any other on Google App Engine. Since the Code Jam platform has been around since the very early days of App Engine -- before it was released to the public -- we have a lot of very old code lying around, including extensive use of a deprecated version of the API that interacts with the datastore backend. Since we're its only users, that API isn't as well-tested as the "db" module that everyone else uses. Shortly after the qualification round started, the App Engine team pushed a new set of servers into production. This should have been seamless, and for every other app it was; but the push introduced a bug in the old datastore API. After roughly two hours of joint investigation, App Engine team brought up a set of servers using the old binary and Code Jam was moved onto those servers, resolving the problem. To make up for the time the bug cost some of our users, we extended the round by two hours. Timeline At 16:00 Pacific time (UTC-7:00), the qualification round began and everything was working fine. At 16:38 we began to get bug reports and to see strange errors in our logs: BadRequestError: id or name, but not both, must be set in each key path element. We started looking for a problem in our own code. At 16:59 we contacted the App Engine developer oncall. By 17:15 all the right people on App Engine team were working on the problem. They'd also determined that we were the only affected app. At 18:15 we determined exactly what the problem was, which servers were running the problematic code, and that a rollback was likely to solve it. At 18:26 we moved onto a small set of servers running the old binary, and the problem was temporarily resolved. At 18:36 we overloaded those servers, and had to move back on to the servers running the new binary. At 19:00 we moved onto to a much larger number of servers running the old binary, and the problem was resolved. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "google-codejam" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-code?hl=en -~----------~----~----~----~------~----~------~--~---
