On Wednesday, 23 October 2013 at 15:44:54 UTC, Apollo Hogan wrote:
For example, the appended code produces the following output
when compiled (DMD32 D Compiler v2.063.2 under WinXP/cygwin)
with no optimization:
immutable(pair)(1.1, -2.03288e-20)
pair(1, 0.1)
pair(1.1, -8.32667e-17)
and the following results when compiled with optimization (-O):
immutable(pair)(1.1, -2.03288e-20)
pair(1, 0.1)
pair(1.1, 0)
The desired result would be:
immutable(pair)(1.1, -8.32667e-17)
pair(1, 0.1)
pair(1.1, -8.32667e-17)
Cheers,
--Apollo
import std.stdio;
struct pair { double hi, lo; }
pair normalize(pair q)
{
double h = q.hi + q.lo;
double l = q.lo + (q.hi - h);
return pair(h,l);
}
void main()
{
immutable static pair spn = normalize(pair(1.0,0.1));
writeln(spn);
writeln(pair(1.0,0.1));
writeln(normalize(pair(1.0,0.1)));
}
I can replicate it here. Here is an objdump diff of normalize:
Optimized:
| Unoptimized:
08076bdc <_D6fptest9normalizeFS6fptest4pairZS6fptest4pair>:
08076bdc <_D6fptest9normalizeFS6fptest4pairZS6fptest4pair>:
8076bdc: 55 push %ebp
8076bdc: 55 push %ebp
8076bdd: 8b ec mov %esp,%ebp
8076bdd: 8b ec mov
%esp,%ebp
8076bdf: 83 ec 10 sub $0x10,%esp
| 8076bdf: 83 ec 14 sub
$0x14,%esp
8076be2: dd 45 08 fldl 0x8(%ebp)
8076be2: dd 45 08 fldl
0x8(%ebp)
8076be5: d9 c0 fld %st(0)
| 8076be5: dc 45 10 faddl
0x10(%ebp)
8076be7: 89 c1 mov %eax,%ecx
| 8076be8: dd 5d ec fstpl
-0x14(%ebp)
8076be9: dc 45 10 faddl 0x10(%ebp)
| 8076beb: dd 45 08 fldl
0x8(%ebp)
8076bec: dd 55 f0 fstl
-0x10(%ebp) | 8076bee: dc 65 ec
fsubl -0x14(%ebp)
8076bef: de e9 fsubrp %st,%st(1)
| 8076bf1: dc 45 10 faddl
0x10(%ebp)
8076bf1: dd 45 f0 fldl
-0x10(%ebp) | 8076bf4: dd 5d f4
fstpl -0xc(%ebp)
8076bf4: d9 c9 fxch %st(1)
| 8076bf7: dd 45 ec fldl
-0x14(%ebp)
8076bf6: dc 45 10 faddl 0x10(%ebp)
| 8076bfa: dd 18 fstpl (%eax)
8076bf9: dd 5d f8 fstpl -0x8(%ebp)
| 8076bfc: dd 45 f4 fldl
-0xc(%ebp)
8076bfc: dd 45 f8 fldl -0x8(%ebp)
| 8076bff: dd 58 08 fstpl
0x8(%eax)
8076bff: d9 c9 fxch %st(1)
| 8076c02: c9 leave
8076c01: dd 19 fstpl (%ecx)
| 8076c03: c2 10 00 ret $0x10
8076c03: dd 59 08 fstpl 0x8(%ecx)
| 8076c06: 90 nop
8076c06: 8b e5 mov %ebp,%esp
| 8076c07: 90 nop
8076c08: 5d pop %ebp
| 8076c08: 90 nop
8076c09: c2 10 00 ret $0x10
| 8076c09: 90 nop
> 8076c0a: 90 nop
> 8076c0b: 90 nop
I cannot see any significant difference. The fadd-fsub-fadd
sequence seems to be the same in both cases.