Re: Treating the abusive unsigned syndrome

Don Fri, 28 Nov 2008 04:45:18 -0800

Andrei Alexandrescu wrote:

KennyTM~ wrote:
KennyTM~ wrote:
Andrei Alexandrescu wrote:
Don wrote:
Andrei Alexandrescu wrote:
Don wrote:
Andrei Alexandrescu wrote:
One fear of mine is the reaction of throwing of hands in the air"how many integral types are enough???". However, if we're tojudge by the addition of long long and a slew of typedefs to C99and C++0x, the answer is "plenty". I'd be interested in gaginghow people feel about adding two (bits64, bits32) or even four(bits64, bits32, bits16, and bits8) types as basic types. They'dbe bitbags with undecided sign ready to be converted to theircounterparts of decided sign.
Here I think we have a fundamental disagreement: what is an'unsigned int'? There are two disparate ideas:
(A) You think that it is an approximation to a natural number,ie, a 'positive int'.(B) I think that it is a 'number with NO sign'; that is, the signdepends on context. It may, for example, be part of a largernumber. Thus, I largely agree with the C behaviour -- once youhave an unsigned in a calculation, it's up to the programmer toprovide an interpretation.
Unfortunately, the two concepts are mashed together in C-familylanguages. (B) is the concept supported by the language typingrules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both,and partially succeeds by having very lax conversions in alldirections. This leads to the occasional puzzling behaviors. I do*want* uint to be an approximation of a natural number, whileacknowledging that today it isn't much of that.
If we were going to introduce a slew of new types, I'd want themto be for 'positive int'/'natural int', 'positive byte', etc.
Natural int can always be implicitly converted to either int oruint, with perfect safety. No other conversions are possiblewithout a cast.
Non-negative literals and manifest constants are naturals.

The rules are:
1. Anything involving unsigned is unsigned, (same as C).
2. Else if it contains an integer, it is an integer.
3. (Now we know all quantities are natural):
If it contains a subtraction, it is an integer [Probably allowsubtraction of compile-time quantities to remain natural, if thevalues stay in range; flag an error if an overflow occurs].
4. Else it is a natural.
The reason I think literals and manifest constants are soimportant is that they are a significant fraction of the naturalnumbers in a program.
[Just before posting I've discovered that other people haveposted some similar ideas].
That sounds encouraging. One problem is that your approach leavesthe unsigned mess as it is, so although natural types are a niceaddition, they don't bring a complete solution to the table.
Andrei
Well, it does make unsigned numbers (case (B)) quite obscure andlow-level. They could be renamed with uglier names to make thisclearer.But since in this proposal there are no implicit conversions fromuint to anything, it's hard to do any damage with the unsigned typewhich results.Basically, with any use of unsigned, the compiler says "I don'tknow if this thing even has a meaningful sign!".
Alternatively, we could add rule 0: mixing int and unsigned isillegal. But it's OK to mix natural with int, or natural withunsigned.I don't like this as much, since it would make most usage ofunsigned ugly; but maybe that's justified.
I think we're heading towards an impasse. We wouldn't want to makethings much harder for systems-level programs that mix arithmeticand bit-level operations.
I'm glad there is interest and that quite a few ideas were broughtup. Unfortunately, it looks like all have significant disadvantages.
One compromise solution Walter and I discussed in the past is toonly sever one of the dangerous implicit conversions: int -> uint.Other than that, it's much like C (everything involving one unsignedis unsigned and unsigned -> signed is implicit) Let's see where thattakes us.
(a) There are fewer situations when a small, reasonable numberimplicitly becomes a large, weird numnber.
(b) An exception to (a) is that u1 - u2 is also uint, and that's forthe sake of C compatibility. I'd gladly drop it if I could and leaveoperations such as u1 - u2 return a signed number. That assumes theleast and works with small, usual values.

The problem with that, is that you're then forcing the 'unsigned is anatural' interpretation when it may be erroneous.


uint.max - 10 is a uint.

It's an interesting case, because int = u1 - u2 is definitely incorrectwhen u1 > int.max.

uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsignedas a positive number_.But, if you think of it as a natural modulo 2^32, uint = u1-u2 is alwayscorrect, since that's what's happening mathematically.

I'm strongly of the opinion that you shouldn't be able to generate anunsigned accidentally -- you should need to either declare a type asuint, or use the 'u' suffix on a literal.Right now, properties like 'length' being uint means you get too manysurprising uints, especially when using 'auto'.

I take your point about not wanting to give up the full 32 bits ofaddress space. The problem is, that if you have an object x which is>2GB, and a small object y, then x.length - y.length will erroneouslybe negative. If we want code (especially in libraries) to cope with suchlarge objects, we need to ensure that any time there's a subtractioninvolving a length, the first is larger than the second. I think thatwould preclude the combination:


length is uint
byte[].length can exceed 2GB, and code is correct when it does
uint - uint is an int (or even, can implicitly convert to int)

As far as I can tell, at least one of these has to go.

Re: Treating the abusive unsigned syndrome

Reply via email to