Christopher Smith wrote:
If you are talking about programming errors, then recovery is the wrong kind of behavior. At most you want the code to checkpoint and then core dump, preferably in some suitably annoying way that causes someone to notice.

By the way, that *is* the kind of recovery I'm talking about. It's exactly that kind of recovery that C++ and other unsafe languages prevent. Why do you think this isn't recovery? You're not Zen enough about your programming environment. You need to learn some languages where the distinction between "operating system" and "application" is as blurred as the distinction between "process" and "thread" is becoming.

If the programming error of "whoops, forgot to limit the input to the size of my buffer" would checkpoint and dump core, we wouldn't have zillions of viri floating around out there.

I checkpoint (which requires introspection), dump core (except that I just print a stack trace and send it to the database and/or mail it to the support group) and then restart. Usually I don't have to restart the process, but if I can tell what the error was, I may or may not need to. Chances are that a "socket already in use" error isn't going to get cleared up by me restarting the server.

--
  Darren New / San Diego, CA, USA (PST)
    His kernel fu is strong.
    He studied at the Shao Linux Temple.

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to