Christopher Smith wrote:
Darren New wrote:
Christopher Smith wrote:
Um, well, sure you do. If I get a division by zero error, I'm pretty sure my introspection is still going to work.
Not at all. You have two choices with division by zero errors: either you have a logical problem in your code (in which case you have a logical error in your code, a.k.a. a bug, and dumping core is a very good way to help you understand the problem) or somehow memory has become corrupted (in which case you have an unexpected error). In the latter case you can't trust that introspection works.

Right. Note that I'm talking about SAFE LANGUAGES. :-) The last sentence there is exactly why I dislike unsafe languages.
Yes, the raw power of being able to say, "well, the language says it shouldn't" tends to be helpful when you are busily erasing millions of dollars worth of transactions, or perhaps are a navy ship without navigation at sea. ;-)

If you can't trust that your introspection keeps working when your language says it should, then what do you trust? You're basically arguing that since there can always be some sort of failure, you shouldn't code to handle the errors you can.

I disagree.

And no, dumping core is entirely inappropriate in many circumstances. It's the cause of all kinds of messiness in a lot of system services that wind up having to fork off children just in case some piece of code dumps core.
First, if we're using an imaginary piece of perfect hardware with a perfect OS with perfect drivers on a perfect runtime, then we can assume that we've made a logical error here, and you very much want to dump core and stop doing anything so that the error can be corrected or avoided.

Yes, you've made a logical error. No, the appropriate response isn't to dump core, but to dump the part of the program that caused the error.

Put it this way: take your earlier example. You get a division by zero error. My preference is to catch that exception, clean up, and start over. You say it's better to dump core, because you can't trust anything to be perfect. My question for you is: How can you trust that the kernel will correctly dump your core? Isn't dumping core dangerous if the memory corruption may have hit that tables that tell the kernel where the free space is? What if dumping core makes the processor burst into flame due to some previous unencountered relationship between the CPU fan and the hard drive?

Let's just say for a second you did catch the error, how would you suggest you proceed at this point, if you were again using this imaginary system and you did catch a divide by zero error?

I'll tell you how I do it now:
1) Collect a stack trace detailing where in the program the error occurred.
2) Open a socket to the logging server, writing the stack trace.
3) Close all open files, sockets, and free all global variables. Abort all processes waiting on a timer.
4) Delete all temp files of the name patterns this process generates.
5) Reread the sources to the program, in case it has been fixed, thereby redefining all the procedures it uses.
6) Reload all configuration.
7) Start running again.


If I'm writing something like an editor (not necessarily a text editor, mind), I save out to a different file the current snapshot of what the user has been working on. While I'm sure it's handy for the programmer to just dump core, I don't imagine the user would want to be guaranteed to lose all work.

Would you somehow know, despite the unexpected nature of the problem, which magical variable needs to be changed to some other value than zero?

Why would you think that's part of the recovery strategy?

*unexpected* error. Given that you didn't anticipate this problem, how do you know that your fix isn't going to do more harm than good?

How do you know your core dump isn't?

Secondly, I don't find having to fork off children causes all kinds of messiness.

Bully for you.

Indeed, it tends to be a lot less messy than the alternative.

Only in operating systems that do the error handling for you. Otherwise, it's counterproductive.

I know! I can keep my MS-DOS from crashing, just by starting up a separate ... Oh, wait...

Beyond the import factor that I've highlight, what is the huge difference between unwinding a stack until some generalized catch block is found that somehow tries to deal with the problem vs. a process dying and some parent process with a generalized SIG_CHILD handler somehow tries to deal with the problem?

Nothing. That's why I'm wondering why you think my method is so dangerous compared to yours.

The difference is that I can log appropriate messages to the appropriate places about why it went wrong, I can clean up stuff the OS isn't going to (like the temp files), and I can decide on the severity of the problem before taking action.

C++ makes it entirely possible for you to do things without doing anything unspecified.

Of course it does. That's always the mantra. "It's *possible* to write a large program without ever making a programming error. Hence, we don't need to check for programming errors."

> If you want, you could add a little lint check in
your build process that verifies that you always use smart pointers, bounds checkers, etc.

Not really. Only if you restrict yourself to very primitive data structures.

I guess the difference is that a C++ developer would still recognize that the platform he/she is running on isn't perfect, and so when they see an unexpected error, the best strategy is to get out of dodge.

If you were arguing that the best strategy would be to power off the machine until the problem can be analyzed, I might agree. Given that you're going to worry about hardware errors, compiler errors, etc, I don't see why you think dumping core is more safe that doing the same thing from within your program. If your third CPU register is stuck at zero, or your hard drives are on fire, or your memory corruption has overwritten your kernel, starting a new process isn't going to be any safer. BTDTGTTS.

...and that's an assumption that you can't make and exactly why it is "undefined" rather than "unspecified".

Yep. That's why you should be using a language that actually work. Holding up Java as a purely "safe" language when there are unsafe libraries you are required to use is erroneous. The language itself is safe, but if you implement parts in an unsafe language and get them wrong, I don't think you can say the implementation is safe.

--
  Darren New / San Diego, CA, USA (PST)
    His kernel fu is strong.
    He studied at the Shao Linux Temple.

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to