Re: Treating the abusive unsigned syndrome

Russell Lewis Tue, 25 Nov 2008 13:20:18 -0800

I'm of the opinion that we should make mixed-sign operations acompile-time error. I know that it would be annoying in somesituations, but IMHO it gives you clearer, more reliable code.


IMHO, it's a mistake to have implicit casts that lose information.

Want to hear a funny/sad, but somewhat related story? I was chasingdown a segfault recently at work. I hunted and hunted, and finallyfound out that the pointer returned from malloc() was bad. I figuredthat I was overwriting the heap, right? So I added tracing anddebugging everywhere...no luck.

I finally, in desperation, included <stdlib.h> to the source file (therewas a warning about malloc() not being prototyped)...and the segfaultsvanished!!!

The problem was that the xlc compiler, when it doesn't have theprototype for a function, assumes that it returns int...but int is 32bits. Moreover, the compiler was happily implicitly casting that int toa pointer...which was 64 bits.


The compiler was silently cropping the top 32 bits off my pointers.

And it all was a "feature" to make programming "easier."


Russ

Andrei Alexandrescu wrote:

D pursues compatibility with C and C++ in the following manner: if acode snippet compiles in both C and D or C++ and D, then it should havethe same semantics.
A classic problem with C and C++ integer arithmetic is that anyoperation involving at least an unsigned integral receives automaticallyan unsigned type, regardless of how silly that actually is,semantically. About the only advantage of this rule is that it's simple.IMHO it only has disadvantages from then on.
The following operations suffer from the "abusive unsigned syndrome" (uis an unsigned integral, i is a signed integral):
(1) u + i, i + u
(2) u - i, i - u
(3) u - u
(4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with Crequires that these all return unsigned, ouch)
(5) u < i, i < u, u <= i etc. (all ordering comparisons)
(6) -u
Logic operations &, |, and ^ also yield unsigned, but such cases areless abusive because at least the operation wasn't arithmetic in thefirst place. Comparing for equality is also quite a conundrum - shouldminus two billion compare equal to 2_294_967_296? I'll ignore these fornow and focus on (1) - (6).
So far we haven't found a solid solution to this problem that at thesame time allows "good" code pass through, weeds out "bad" code, and iscompatible with C and C++. The closest I got was to have the compilerdefine the following internal types:
__intuint
__longulong
I've called them "dual-signed integers" in the past, but let's try theshorter "undecided sign". Each of these is a subtype of both the signedand the unsigned integral in its name, e.g. __intuint is a subtype ofboth int and uint. (Originally I thought of defining __byteubyte and__shortushort as well but dropped them in the interest of simplicity.)
The sign-ambiguous operations (1) - (6) yield __intuint if no operandsize was larger than 32 bits, and __longulong otherwise. Undecided signtypes define their own operations. Let x and y be values of undecidedsign. Then x + y, x - y, and -x also return a sign-ambiguous integral(the size is that of the largest operand). However, the other operatorsdo not work on sign-ambiguous integrals, e.g. x / y would not compilebecause you must decide what sign x and y should have prior to invokingthe operation. (Rationale: multiplication/division work differentlydepending on the signedness of their operands).
User code cannot define a symbol of sign-ambiguous type, e.g.

auto a = u + i;
would not compile. However, given that __intuint is a subtype of bothint and uint, it can be freely converted to either whenever there's noambiguity:
int a = u + i; // fine
uint b = u + i; // fine
The advantage of this scheme is that it weeds out many (most? all?)surprises and oddities caused by the abusive unsigned rule of C and C++.The disadvantage is that it is more complex and may surprise the novicein its own way by refusing to compile code that looks legit.
At the moment, we're in limbo regarding the decision to go forward withthis. Walter, as many good long-time C programmers, knows the abusiveunsigned rule so well he's not hurt by it and consequently has littleincentive to see it as a problem. I have had to teach C and C++ to youngstudents coming from Java introductory courses and have a moreup-to-date perspective on the dangers. My strong belief is that we needto address this mess somehow, which type inference will only make morepainful (in the hand of the beginner, auto can be a quite dangerous toolfor wrong belief propagation). I also know seasoned programmers who hadno idea that -u compiles and that it also oddly returns an unsigned type.
Your opinions, comments, and suggestions for improvements would asalways be welcome.
Andrei

Re: Treating the abusive unsigned syndrome

Reply via email to