On 8/3/15 6:54 PM, Walter Bright wrote:
On 8/3/2015 8:50 AM, Steven Schveighoffer wrote:
1. They aren't removed, they are replaced with a nearly useless segfault.

Not useless at all:

1. The program does not continue running after it has failed. (Please,
let's not restart that debate.

You can run some code that is fresh, i.e. don't *continue* running failing code, but it's ok, for example, to run a signal handler, or call a pure function. My thought is that you should print the file location and exit (perhaps with segfault).

2. Running it under a debugger, the location of the fault will be
identified.

Yeah, the real problem is when you are not running under a debugger. If you have an intermittent bug, or one that is in the field, a stack trace of the problem is worth 100x more than a segfault and trying to talk a customer through setting up a debugger, or even setting up their system to do a core dump for you to debug. They aren't always interested in that enough to care.

But also, a segfault has a specific meaning. It means, you tried to access memory you don't have access to. This clearly is not that, and it sends the wrong message -- memory corruption. This is not memory corruption, it's a program error.

2. If we are going to put something in there instead of "assert", why
not just
throw an error?

Because they are expensive.

They are expensive when? 2 microseconds before the program ends? At that point, I'd rather you spend as much resources telling me what went wrong as possible. More info the better.

Now, if it's expensive to include the throw vs. not including it (I don't know), then that's a fair point. If it means an extra few instructions to set up the throw (which isn't run at all when the assert(0) line isn't run), that's not convincing.

To keep asserts in all their glory in released code, do not use the
-release switch.

OK, this brings up another debate. The thing that triggered all this is an issue with core.time, see issue https://issues.dlang.org/show_bug.cgi?id=14863

Essentially, we wrote code to get all the clock information at startup on a posix system that supports clock_gettime, which is immutable and can be read easily at startup. However, how we handled the case where a clock could not be fetched is to assert(0, "don't support clock " ~ clockname).

The clear expectation was that the message will be printed (along with file/line number), and we can go fix the issue.

On Linux 2.6.32 - a supported LTS release I guess - one of these clocks is not supported. This means running a simple "hello world" program crashes at startup in a segfault, not a nice informative message.

So what is the right answer here? We want an assert to trigger for this, but the only assert that stays in is assert(0). However, that's useless if all someone sees is "segfuault". "some message" never gets printed, because druntime is compiled in release mode. I'm actually quite thrilled that someone figured this all out -- one person analyzed his core dump to narrow down the function, and happened to include his linux kernel version, and another person had the arcane knowledge that some clock wasn't supported in that version.

Is there a better mechanism we should be using in druntime that is clearly going to be compiled in release mode? Should we just throw an assert error directly? Clearly the code is not "corrupted" at this point, it's just an environmental issue. But we don't want to continue execution. What is the right call?

At the very least, assert(0, "message") should be a compiler error, the message is unused information.

-Steve

Reply via email to