Those of you who participated in the first few hours of the qualification
round may have noticed some problems with the site: a high percentage of the
time, people who tried to download an input were told that there had been a
server error, and that they should try again. First and foremost we
apologize for this; we'd also like to explain what happened.

Summary

Google Code Jam runs as an app like any other on Google App Engine. Since
the Code Jam platform has been around since the very early days of App
Engine -- before it was released to the public -- we have a lot of very old
code lying around, including extensive use of a deprecated version of the
API that interacts with the datastore backend. Since we're its only users,
that API isn't as well-tested as the "db" module that everyone else uses.
Shortly after the qualification round started, the App Engine team pushed a
new set of servers into production. This should have been seamless, and for
every other app it was; but the push introduced a bug in the old datastore
API. After roughly two hours of joint investigation, App Engine team brought
up a set of servers using the old binary and Code Jam was moved onto those
servers, resolving the problem.  To make up for the time the bug cost some
of our users, we extended the round by two hours.

Timeline

At 16:00 Pacific time (UTC-7:00), the qualification round began and
everything was working fine.
At 16:38 we began to get bug reports and to see strange errors in our logs:
BadRequestError: id or name, but not both, must be set in each key path
element. We started looking for a problem in our own code.
At 16:59 we contacted the App Engine developer oncall.
By 17:15 all the right people on App Engine team were working on the
problem. They'd also determined that we were the only affected app.
At 18:15 we determined exactly what the problem was, which servers were
running the problematic code, and that a rollback was likely to solve it.
At 18:26 we moved onto a small set of servers running the old binary, and
the problem was temporarily resolved.
At 18:36 we overloaded those servers, and had to move back on to the servers
running the new binary.
At 19:00 we moved onto to a much larger number of servers running the old
binary, and the problem was resolved.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"google-codejam" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-code?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to