Denis Koroskin wrote:
On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu
<[EMAIL PROTECTED]> wrote:
D pursues compatibility with C and C++ in the following manner: if a
code snippet compiles in both C and D or C++ and D, then it should
have the same semantics.
A classic problem with C and C++ integer arithmetic is that any
operation involving at least an unsigned integral receives
automatically an unsigned type, regardless of how silly that actually
is, semantically. About the only advantage of this rule is that it's
simple. IMHO it only has disadvantages from then on.
The following operations suffer from the "abusive unsigned syndrome"
(u is an unsigned integral, i is a signed integral):
(1) u + i, i + u
(2) u - i, i - u
(3) u - u
(4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C
requires that these all return unsigned, ouch)
(5) u < i, i < u, u <= i etc. (all ordering comparisons)
(6) -u
Logic operations &, |, and ^ also yield unsigned, but such cases are
less abusive because at least the operation wasn't arithmetic in the
first place. Comparing for equality is also quite a conundrum - should
minus two billion compare equal to 2_294_967_296? I'll ignore these
for now and focus on (1) - (6).
So far we haven't found a solid solution to this problem that at the
same time allows "good" code pass through, weeds out "bad" code, and
is compatible with C and C++. The closest I got was to have the
compiler define the following internal types:
__intuint
__longulong
I've called them "dual-signed integers" in the past, but let's try the
shorter "undecided sign". Each of these is a subtype of both the
signed and the unsigned integral in its name, e.g. __intuint is a
subtype of both int and uint. (Originally I thought of defining
__byteubyte and __shortushort as well but dropped them in the interest
of simplicity.)
The sign-ambiguous operations (1) - (6) yield __intuint if no operand
size was larger than 32 bits, and __longulong otherwise. Undecided
sign types define their own operations. Let x and y be values of
undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous
integral (the size is that of the largest operand). However, the other
operators do not work on sign-ambiguous integrals, e.g. x / y would
not compile because you must decide what sign x and y should have
prior to invoking the operation. (Rationale: multiplication/division
work differently depending on the signedness of their operands).
User code cannot define a symbol of sign-ambiguous type, e.g.
auto a = u + i;
would not compile. However, given that __intuint is a subtype of both
int and uint, it can be freely converted to either whenever there's no
ambiguity:
int a = u + i; // fine
uint b = u + i; // fine
The advantage of this scheme is that it weeds out many (most? all?)
surprises and oddities caused by the abusive unsigned rule of C and
C++. The disadvantage is that it is more complex and may surprise the
novice in its own way by refusing to compile code that looks legit.
At the moment, we're in limbo regarding the decision to go forward
with this. Walter, as many good long-time C programmers, knows the
abusive unsigned rule so well he's not hurt by it and consequently has
little incentive to see it as a problem. I have had to teach C and C++
to young students coming from Java introductory courses and have a
more up-to-date perspective on the dangers. My strong belief is that
we need to address this mess somehow, which type inference will only
make more painful (in the hand of the beginner, auto can be a quite
dangerous tool for wrong belief propagation). I also know seasoned
programmers who had no idea that -u compiles and that it also oddly
returns an unsigned type.
Your opinions, comments, and suggestions for improvements would as
always be welcome.
Andrei
I think it's fine. That's the way the LLVM stores the integral values
internally, IIRC.
But what is the type of -u? If it is undecided, then the following
should compile:
uint u = 100;
uint s = -u; // undecided implicitly convertible to unsigned
Yah, but at least you actively asked for an unsigned. Compare and
contrast with surprises such as:
uint a = 5;
writeln(-a); // this won't print -5
Such code would be disallowed in the undecided-sign regime.
Andrei