Built-in unsafety in D

bearophile Fri, 12 Mar 2010 05:50:52 -0800

This is a follow-up of this thread, and other older threads on this topic:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=104965


This is a nice article written in 2005 by Thomas Guest, "Built-in Type Safety?":
http://www.artima.com/cppsource/typesafetyP.html

It shows some bugs common in C++ code that I really really hope D will help 
avoid. It's 2010, so it's about time. (Note: the C# v.4 language give ways to 
avoid them all).

For me having a way to avoid most of those bugs is more important than:
- Having a good operator overload system;
- Having a way to break/ignore circular imports;
- Having actors;
- Having transitive immutability;
- Having true closures;
- Having good data structures in the standard library;
- Having efficient literal arrays;
- Having fast associative arrays, built-in or in a library;
- Having an efficient dynamic array append;
- Changing fixed sized arrays semantics to returning them by value;
- etc.


This is a compressed version of the function shown near the top of that article:

void signalUpdate(Signal update, Signal & stored) {
    int const tolerance = 10;
    if ((update > stored + tolerance) || (update < stored - tolerance)) {
        flashWriteSignal(update);
        stored = update;
    }
}


The bug was caused by:
typedef unsigned Signal;

Quoting the article:

So, the expression in signalsDifferent(10, 10, 20) evaluates: 10u > 10u + 20 || 
10u < 10u - 20
Now, when you subtract an int from an unsigned  both values are promoted to 
unsigned and the result is unsigned. So, 10u - 20 is a very big unsigned 
number. Our expression is therefore equivalent to:
false||true which is of course true.


They originally have written very bad unittests, so that's part of the cause of 
their problem. But C++ too is flawed, this part of the design of C was maybe OK 
in 1970 but in 2010 is unacceptable. This is one of the few cases where 
breaking compatibility with C can be acceptable (and I think that breaking C 
compatibility for this purpose is more important for breaking it to improve the 
semantics of fixed sized arrays, as recently done).

CommonLisp has taught us that many functions in a program don't need max 
performance, so using efficient (usually not heap-allocated) multi-precision 
integers into them is not going to slow down a program significantly, but can 
avoid many integral values-related bugs. In Lisp the fixnums are usually a 
performance optimization you can use in selected performance-critical 
functions. In Lisp using fixnums everywhere in a program is (correctly) seen as 
premature optimization.

Even if D doesn't want to go the CLisp way, and wants to keep using C-style 
fixed-sized bit fields to represent integral values, I feel that having 
optional runtime overflow errors for integral values can help locate many of 
those bugs during the creation of a program (there can be two compilation 
switches, one to switch on those runtime errors only for signed integral 
values, and one to switch them on on both signed and unsigned integral values).

If you don't believe me, you can take a C# compiler, switch on the overflow 
errors, and then write a medium program, you will see your compiler+runtime 
happily catching several of your integral-related bugs.

Notes:
- In D more sane & stricter promotion rules from signed <-> unsigned values too 
will help, but they can't replace overflow errors.
- Adding a Sint (safe integral value) struct in the standard library is not a 
solution, because generally no one will use it.
- Avoiding the usage of unsigned values everywhere possible in the language and 
standard library too helps. I don't understand why the length attribute of 
arrays and the array indexes are unsigned in D (in C# they are signed, despite 
C# allows the user to use unsigned types), but so far I think it's a bad design 
choice that I'd like to change as soon as possible.

-----------------

The second problem shown in that article has a simpler and less disruptive 
solution: named arguments will be something useful to have in D. But this is a 
additive change, so I think there is no need to rush for this, it can wait.

-----------------

The third problem shown in that article was related to the usage of booleans to 
represent an input value for a function. Such usage of a boolean is indeed not 
clear at the calling point:

void textRender(std::string const & text,
                Rectangle const & region,
                bool wrap = false,
                bool bold = false,
                bool justify = false,
                int first_line_indent = 0);

textRender(text,
           full_screen, 
           true); // wrap text


In Python I have seen that named arguments help solve this problem a lot, 
because you use a name that lets you understand the purpose of the boolean.

In alternative another possible solution is shown in this Wish of the D Wish 
List, "Inline enum declaration":
http://all-technology.com/eigenpolls/dwishlist/index.php?it=76

That page contains:
void ShowWindow( enum{Show,Hide} sw ) { ... }
* self-documenting
* better than using "bool" (what's true/false?)
* no dummy types (otherwise enum showwindow_t {...})

On the surface it looks cute, but I don't like that solution a lot because it's 
a locally defined type, so you can't store it elsewhere, you can't store the 
argument of this function somewhere before giving such arguments to the 
function, etc.

So I think named arguments are enough to solve most of this third problem too. 
But named arguments can be added later, for example in D2.5 or D3. D2 contains 
enough bugs now, I think it's better to remove some of them before adding other 
_additive_ features. While changing the way integral values are managed is a 
breaking change, and it's not fit for D3.

Bye,
bearophile

Built-in unsafety in D

Reply via email to