Christopher Smith wrote:
Darren New wrote:
Christopher Smith wrote:
Um, well, sure you do. If I get a division by zero error, I'm pretty
sure my introspection is still going to work.
Not at all. You have two choices with division by zero errors: either
you have a logical problem in your code (in which case you have a
logical error in your code, a.k.a. a bug, and dumping core is a very
good way to help you understand the problem) or somehow memory has
become corrupted (in which case you have an unexpected error). In the
latter case you can't trust that introspection works.
Right. Note that I'm talking about SAFE LANGUAGES. :-) The last
sentence there is exactly why I dislike unsafe languages.
Yes, the raw power of being able to say, "well, the language says it
shouldn't" tends to be helpful when you are busily erasing millions of
dollars worth of transactions, or perhaps are a navy ship without
navigation at sea. ;-)
If you can't trust that your introspection keeps working when your
language says it should, then what do you trust? You're basically
arguing that since there can always be some sort of failure, you
shouldn't code to handle the errors you can.
I disagree.
And no, dumping core is entirely inappropriate in many circumstances.
It's the cause of all kinds of messiness in a lot of system services
that wind up having to fork off children just in case some piece of
code dumps core.
First, if we're using an imaginary piece of perfect hardware with a
perfect OS with perfect drivers on a perfect runtime, then we can assume
that we've made a logical error here, and you very much want to dump
core and stop doing anything so that the error can be corrected or
avoided.
Yes, you've made a logical error. No, the appropriate response isn't to
dump core, but to dump the part of the program that caused the error.
Put it this way: take your earlier example. You get a division by zero
error. My preference is to catch that exception, clean up, and start
over. You say it's better to dump core, because you can't trust anything
to be perfect. My question for you is: How can you trust that the kernel
will correctly dump your core? Isn't dumping core dangerous if the
memory corruption may have hit that tables that tell the kernel where
the free space is? What if dumping core makes the processor burst into
flame due to some previous unencountered relationship between the CPU
fan and the hard drive?
Let's just say for a second you did catch the error, how would
you suggest you proceed at this point, if you were again using this
imaginary system and you did catch a divide by zero error?
I'll tell you how I do it now:
1) Collect a stack trace detailing where in the program the error occurred.
2) Open a socket to the logging server, writing the stack trace.
3) Close all open files, sockets, and free all global variables. Abort
all processes waiting on a timer.
4) Delete all temp files of the name patterns this process generates.
5) Reread the sources to the program, in case it has been fixed, thereby
redefining all the procedures it uses.
6) Reload all configuration.
7) Start running again.
If I'm writing something like an editor (not necessarily a text editor,
mind), I save out to a different file the current snapshot of what the
user has been working on. While I'm sure it's handy for the programmer
to just dump core, I don't imagine the user would want to be guaranteed
to lose all work.
Would you
somehow know, despite the unexpected nature of the problem, which
magical variable needs to be changed to some other value than zero?
Why would you think that's part of the recovery strategy?
*unexpected* error. Given that you didn't anticipate this problem, how
do you know that your fix isn't going to do more harm than good?
How do you know your core dump isn't?
Secondly, I don't find having to fork off children causes all kinds of
messiness.
Bully for you.
Indeed, it tends to be a lot less messy than the alternative.
Only in operating systems that do the error handling for you. Otherwise,
it's counterproductive.
I know! I can keep my MS-DOS from crashing, just by starting up a
separate ... Oh, wait...
Beyond the import factor that I've highlight, what is the huge
difference between unwinding a stack until some generalized catch block
is found that somehow tries to deal with the problem vs. a process dying
and some parent process with a generalized SIG_CHILD handler somehow
tries to deal with the problem?
Nothing. That's why I'm wondering why you think my method is so
dangerous compared to yours.
The difference is that I can log appropriate messages to the appropriate
places about why it went wrong, I can clean up stuff the OS isn't going
to (like the temp files), and I can decide on the severity of the
problem before taking action.
C++ makes it entirely possible for you to do things without doing
anything unspecified.
Of course it does. That's always the mantra. "It's *possible* to write a
large program without ever making a programming error. Hence, we don't
need to check for programming errors."
> If you want, you could add a little lint check in
your build process that verifies that you always use smart pointers,
bounds checkers, etc.
Not really. Only if you restrict yourself to very primitive data structures.
I guess the difference is that a C++ developer
would still recognize that the platform he/she is running on isn't
perfect, and so when they see an unexpected error, the best strategy is
to get out of dodge.
If you were arguing that the best strategy would be to power off the
machine until the problem can be analyzed, I might agree. Given that
you're going to worry about hardware errors, compiler errors, etc, I
don't see why you think dumping core is more safe that doing the same
thing from within your program. If your third CPU register is stuck at
zero, or your hard drives are on fire, or your memory corruption has
overwritten your kernel, starting a new process isn't going to be any
safer. BTDTGTTS.
...and that's an assumption that you can't make and exactly why it is
"undefined" rather than "unspecified".
Yep. That's why you should be using a language that actually work.
Holding up Java as a purely "safe" language when there are unsafe
libraries you are required to use is erroneous. The language itself is
safe, but if you implement parts in an unsafe language and get them
wrong, I don't think you can say the implementation is safe.
--
Darren New / San Diego, CA, USA (PST)
His kernel fu is strong.
He studied at the Shao Linux Temple.
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg