Re: Signed word lengths and indexes

bearophile Tue, 15 Jun 2010 02:40:26 -0700

Walter Bright:

>D's safe mode, integer overflow *cannot* lead to memory corruption. So when 
>you say something is "unsafe", I think it's reasonable to ask what you mean by 
>it.<


I meant "more numerical safe". That is it helps avoid part of the 
integral-derived bugs.


>We've tried very hard to not have such things in D. The idea is that code that 
>looks the same either behaves the same or issues an error. There's no way to 
>make your proposal pass this requirement.<

I see. We can drop this, then.


>We can argue forever with how significant it is, I don't assign nearly as much 
>to it as you do.<

I see. If you try solving many Project Euler problems you can see how common 
those bugs are :-) For other kind of code they are probably less common.


>If you use the D loop abstractions, you should never have these issues with 
>it.<

In D I am probably using higher loop abstractions than the ones you use 
normally, but now and then I have those bugs anyway. Talking the length of an 
array is necessary now and then even if you use loop abstractions (and 
higher-order functions as maps, filters, etc).


>Here's what the wikipedia said about it.
>>"In Python, a number that becomes too large for an integer seamlessly becomes 
>>a long.[1] And in Python 3.0, integers and arbitrary sized longs are 
>>unified."<<

This is exactly the same things I have said :-)


>(Just switching to long isn't good enough - what happens when long overflows?<

Maybe this is where you didn't understand the situation: Python 2.x "long" 
means multi-precision integral numbers. In my example the number was 1001 
decimal digits long.


>I generally don't like solution like this because it makes tripping the bug so 
>rare that it can lurk for years. I prefer to flush bugs out in the open 
>early.)<

In Python 2.x this causes zero bugs because those "longs" are multi-precision.


>3x is a BIG deal. If you're running a major site, this means you only need 1/3 
>of the hardware, and 1/3 of the electric bill. If you're running a program 
>that takes all day, now you can run it 3 times that day.<

This point of the discussion is probably too much indefinite to say something 
useful about it. I can answer you that in critical spots of the program it is 
probably easy enough to replace multiprecision ints with fixnums, and this can 
make the whole program no significantly slower than C code. And in some places 
the compiler can infer where fixnums are enough and use them automatically. In 
the end regarding this point mine is mostly a gut feeling derived from many 
years of usage of multiprecision numbers: I think that in a nearly-system 
language as D well implemented multi-precision numbers (with the option to use 
fixnums in critical spots) can lead to efficient enough programs. I have 
programmed in a compiled CLisp a bit, and the integer value performance is not 
so bad. I can of course be wrong, but only an actual test can show it :-) Maybe 
someday I will try it and do some benchmarks. Current BigInt of D need the 
small-number optimization before a test can be tried (that i!
 s to avoid heap allocation when the bignumber fits in 32 or 64 bits), and the 
compiler is not smart enough to replace bigints with ints where bigints are not 
necessary. In the meantime I have done several benchmarks in C# with runtime 
ingegral overflow enabled or disabled, and I have seen that the performance 
with those enabled is only a bit less, not significantly so (I have seen the 
same thing in Delphi years ago).


>That idea has a lot of merit for 64 bit systems. But there are two problems 
>with it: 1. D source code is supposed to be portable between 32 and 64 bit 
>systems. This would fail miserably if the sign of things silently change in 
>the process.<

Then we can use a signed word on 32 bit systems too.
Or if you don't like that, to represent lengths/indexes we can use 64 bit 
signed values on 32 bit systems too.


>2. For an operating system kernel's memory management logic, it still would 
>make sense to represent the address space as a flat range from 0..n, not one 
>that's split in the middle, half of which is accessed with negative offsets. D 
>is supposed to support OS development.<

I am not expert enough about this to understand well the downsides of signed 
numbers used in this. But I can say that D is already not the best language to 
develop non-toy operating systems.
And even if someone writes a serious operating system with D, this is an 
uncommon application of D language, where probably 95% of other people write 
other kinds of programs where unsigned integers everywhere are not the best 
thing.
And the uncommon people that want to write an OS or device driver with D can 
use signed words. Such uncommon people can even design and use their own arrays 
with unsigned-word lengths/indexes :-)
Designing D to appeal to a very uncommon kind of power-users that need to write 
an operating system with D doesn't look like a good design choice.

If this whole thread goes nowhere then later I can even close bug 3843, because 
there's little point in keeping it open.

Bye,
bearophile

Re: Signed word lengths and indexes

Reply via email to