On Thu, 22 Apr 2010 17:29:53 +0200 [email protected] wrote:
> Data:
> Under NetBSD/gcc, I have the following values:
>
> before: x1:=5440, x2:=-5843, x3:=78909
> after: x1:=5440, x2:=-201, x3:=18166, r:=6827 t:=30232
>
> Under Plan9/gcc, I have the following values:
>
> before: x1:=5440, x2:=-5843, x3:=78909
> after: x1:=5440, x2:=2147483447, x3:=1073759990, r:=6827 t:=-1073711592
>
> Uhm... seems to have a `slight' divergence...
>
> In fact, all wrong values depend upon x2, that has the "correct"
> value... with 2^31 complement. A positive when it should be negative,
> since the offending code is the following:
>
> x2 = half ( x1 + x2 + xicorr ) ;
>
> that is :
> x2 = (5440 - 5843 + 1) / 2;
>
> Not exactly pushing things to the limit! And yes, the expected result is
> indeed -201.
You would get 2147483447 if x1 and x2 were treated as
unsigned numbers but -201 if treated as signed. Try this:
cat > x.c <<EOF
#include <stdio.h>
NUM f(NUM x, NUM y) { return (x + y + 1) / 2; }
int main(int c, char**v) { printf("%d\n", f(atoi(v[1]), atoi(v[2]))); }
EOF
cc -DNUM=signed x.c && a.out 5440 -5843
cc -DNUM=unsigned x.c && a.out 5440 -5843
What is the type of x1 and x2? Can you show an actual C code
fragment? Don't worry about it being complete. Just the half()
function (or macro), header of the function where it is
called, declarations for x1 and x2 and a couple of lines of
around call to half. I am still wondering if this is due to a
different interpretation of language semantics by the two
compilers.
> Since the problem arises in this context, but not if you just add
> this isolated in a test program, and call it with these very 3
> values (5440, -5843, 1), it is clear that's the way the computation
> is handled with huge number of parameters and auto variables
> that wreaks havoc.
You *suspect* this but you need to prove it. An isolated
test case that doesn't trigger this problem simply means you
have not created the right condition for the bug. Creating a
simple test can be tricky and may be more work than debugging
your program.
> If I declare all the auto volatile, this does nothing: same result.
>
> If I do the addition, and afterwards take the half, that works:
>
> x2 += x1 + xicorr;
> x2 = half(x2); /* works! */
I wouldn't bother changing anything. You already have a
smoking gun (at least you know in which neighbourhood it has
gone off). You can try a binary search to narrow down the
area but in the end you will have to look at the assembly
output of the relevant code fragment.