Re: Linux C/C++ IDE

Darren New Mon, 11 Jun 2007 08:12:27 -0700

Christopher Smith wrote:

If you are talking about programming errors, then recovery is the wrongkind of behavior. At most you want the code to checkpoint and then coredump, preferably in some suitably annoying way that causes someone tonotice.

Errr, no, not really. If someone puts a CGI script on my shared webserver that has a bug in it, I don't really want to coredump the webserver and take down my whole business, just so I notice.

Instead, I recover gracefully and send myself a page, which is suitablyannoying enough to notice, thankyouverymuch.


> When you have hundreds of nodes to manage, you don't want

programmer errors floating around the ether and code trying to recoverfrom it.

"Recover from it" involves logging the error, cleaning up resources, andrestarting. Or, if it's user-submitted code, setting a flag saying notto try to run that any more, telling the user, etc.

Why would I want to leave files hanging open, sockets connected, memoryallocated, and database transactions inconsistent? Oh, wait, thatdoesn't happen because people wrote this OS that tries (unsuccessfully)to clean up after you when you fail.

This violates the "fail fast" principle. After all, if someprogrammer in some unknown place did write code that divides by zero,how is your exception handler to know how to fix it?

It doesn't violate the "fail fast" principle any more than the OSclosing open files and freeing your memory violates the "fail fast"principle. You're just saying "rely on the OS to be properly coded to dothis for you" instead of "rely on your interpreter/compiler to beproperly coded to do this for you."

I'm sorry, if you have hundreds of nodes, you have to *expect* hardwarefailures on a regular basis.

Right. But I don't need to do it at the same level of abstraction. Ifthe whole rack catches on fire, dumping core or logging errors becomesirrelevant, and the next level of monitoring kicks in.

Okay, I'm now thoroughly confused by what you mean by an unexpectederror then. If it doesn't occur when the hardware does somethingunexpected,

Who said that? How many application programs have you written that checkwhen they add two numbers, they come up with the right answer? Or thatafter you store something in an integer variable, it doesn't changebefore you read it next time?

Sure, if you're launching an interplanetary satellite, you check thesethings. Otherwise, the probability of that going south is too low toworry about. Far, far lower than the probability that you've made acoding error, or that what you coded isn't what you wanted, or that theprogram will live out its entire lifecycle without ever running across aproblem that happens once every 2^128 instructions.

I mean, people still make fun of Intel for having had a bug in theirmath processor.

the software does something unexpected,

Well, yes, that too. But again, how many application programs have youwritten check that the compiler outputted the right code? Sure, ifyou're launching an interplanetary satellite, you look at the compiledcode and check it against the source and make sure there aren't anyproblems.

How do you handle it when the kernel's bug decides to just not relaunchthe file that just core dumped? What do you do when the file systemstarts writing directory blocks over top of your inode tables? What doyou do when you start getting piles of processes that even a kill -9won't get rid of?

There are some errors you don't bother trying to recover fromautomatically. (Altho that last one I actually wrote a script to recoverfrom.)

or the programmer does something unintended...

Right. Like there's never been a time when people wrote faulty code thata range-checking language (as an example) would have caught at runtime.

Either that or the programmeris incompetent and is just as likely to screw up the error handling.

You have to handle the error *some* way. I'm not sure why handlinginside your code the errors that are known not to screw up your languagesemantics isn't just as good as handling it in some other piece of code.

And no, while it's difficult to write good error handling, once you havethe error handling in place, chances are it's covering a lot of code.

Once you have transactional rollback in your database, it too covers allkinds of errors, including your application dumping core.


--
  Darren New / San Diego, CA, USA (PST)
    His kernel fu is strong.
    He studied at the Shao Linux Temple.

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Re: Linux C/C++ IDE

Reply via email to