On Wednesday, 23 October 2013 at 15:44:54 UTC, Apollo Hogan wrote:
For example, the appended code produces the following output when compiled (DMD32 D Compiler v2.063.2 under WinXP/cygwin) with no optimization:

immutable(pair)(1.1, -2.03288e-20)
pair(1, 0.1)
pair(1.1, -8.32667e-17)

and the following results when compiled with optimization (-O):

immutable(pair)(1.1, -2.03288e-20)
pair(1, 0.1)
pair(1.1, 0)

The desired result would be:

immutable(pair)(1.1, -8.32667e-17)
pair(1, 0.1)
pair(1.1, -8.32667e-17)

Cheers,
--Apollo

import std.stdio;
struct pair { double hi, lo; }
pair normalize(pair q)
{
  double h = q.hi + q.lo;
  double l = q.lo + (q.hi - h);
  return pair(h,l);
}
void main()
{
  immutable static pair spn = normalize(pair(1.0,0.1));
  writeln(spn);
  writeln(pair(1.0,0.1));
  writeln(normalize(pair(1.0,0.1)));
}

I can replicate it here. Here is an objdump diff of normalize:

Optimized: | Unoptimized: 08076bdc <_D6fptest9normalizeFS6fptest4pairZS6fptest4pair>: 08076bdc <_D6fptest9normalizeFS6fptest4pairZS6fptest4pair>: 8076bdc: 55 push %ebp 8076bdc: 55 push %ebp 8076bdd: 8b ec mov %esp,%ebp 8076bdd: 8b ec mov %esp,%ebp 8076bdf: 83 ec 10 sub $0x10,%esp | 8076bdf: 83 ec 14 sub $0x14,%esp 8076be2: dd 45 08 fldl 0x8(%ebp) 8076be2: dd 45 08 fldl 0x8(%ebp) 8076be5: d9 c0 fld %st(0) | 8076be5: dc 45 10 faddl 0x10(%ebp) 8076be7: 89 c1 mov %eax,%ecx | 8076be8: dd 5d ec fstpl -0x14(%ebp) 8076be9: dc 45 10 faddl 0x10(%ebp) | 8076beb: dd 45 08 fldl 0x8(%ebp) 8076bec: dd 55 f0 fstl -0x10(%ebp) | 8076bee: dc 65 ec fsubl -0x14(%ebp) 8076bef: de e9 fsubrp %st,%st(1) | 8076bf1: dc 45 10 faddl 0x10(%ebp) 8076bf1: dd 45 f0 fldl -0x10(%ebp) | 8076bf4: dd 5d f4 fstpl -0xc(%ebp) 8076bf4: d9 c9 fxch %st(1) | 8076bf7: dd 45 ec fldl -0x14(%ebp) 8076bf6: dc 45 10 faddl 0x10(%ebp) | 8076bfa: dd 18 fstpl (%eax) 8076bf9: dd 5d f8 fstpl -0x8(%ebp) | 8076bfc: dd 45 f4 fldl -0xc(%ebp) 8076bfc: dd 45 f8 fldl -0x8(%ebp) | 8076bff: dd 58 08 fstpl 0x8(%eax) 8076bff: d9 c9 fxch %st(1) | 8076c02: c9 leave 8076c01: dd 19 fstpl (%ecx) | 8076c03: c2 10 00 ret $0x10 8076c03: dd 59 08 fstpl 0x8(%ecx) | 8076c06: 90 nop 8076c06: 8b e5 mov %ebp,%esp | 8076c07: 90 nop 8076c08: 5d pop %ebp | 8076c08: 90 nop 8076c09: c2 10 00 ret $0x10 | 8076c09: 90 nop > 8076c0a: 90 nop > 8076c0b: 90 nop

I cannot see any significant difference. The fadd-fsub-fadd sequence seems to be the same in both cases.

Reply via email to