D-Ers,

I have been getting counterintuitive results on avx/no-avx timing
experiments.  Storyline to date (notes at end):

**Experiment #1)** Real float data type (i.e. non-complex numbers),
speed comparison.
a) moving from non-avx --> avx shows non-realistic speed up of 15-25 X.
  b)  this is weird, but story continues ...

**Experiment #2)** Real double data type (non-complex numbers),
a) moving from non-avx --> avx again shows amazing gains, but the gains are about half of those seen in Experiment #1, so maybe
      this looks plausible?

**Experiment #3)**  Complex!float datatypes:
a) now **going from non-avx to avx shows a serious performance LOSS**
      of 40% to breaking even at best.  What is happening here?

**Experiment #4)**  Complex!double:
a) non-avx --> avx shows performancegains again about 2X (so the
      gains appear to be reasonable).


The main question I have is:

**"What is going on with the Complex!float performance?"** One might expect
floats to have a better perfomance than doubles as we saw with the
real-value data (becuase of vector packaging, memory bandwidth, etc).

But, **Complex!float shows MUCH WORSE avx performance than Complex!Double
(by a factor of almost 4).**

```d
//            Table of Computation Times
//
//       self math              std math
// explicit  no-explicit   explicit  no-explicit
//   align      align        align      align
//   0.12       0.21          0.15      0.21 ;  # Float with AVX
// 3.23 3.24 3.30 3.22 ; # Float without AVX
//   0.31       0.42          0.31      0.42 ;  # Double with AVX
// 3.25 3.24 3.24 3.27 ; # Double without AVX // 6.42 6.62 6.61 6.59 ; # Complex!float with AVX // 4.04 4.17 6.68 5.82 ; # Complex!float without AVX // 1.67 1.69 1.73 1.71 ; # Complex!double with AVX // 3.34 3.42 3.28 3.31 # Complex!double without AVX
```

Notes:

1) Based on forum hints from ldc experts, I got good guidance
   on enabling avx ( i.e. compiling modules on command line, using
   --fast-math and -mcpu=haswell on command line).

2) From Mir-glas experts I received hints to try to implement own version of the complex math. (this is what the "self-math" column refers to).

I understand that detail of the computations are not included here, (I can do that if there is interest, and if I figure out an effective way to present
it in a forum.)

But, I thought I might begin with a simple question, **"Is there some well-known issue that I am missing here". Have others been done this road as well?**

Thanks for any and all input.
Best Regards,
James

PS Sorry for the inelegant table ... I do not believe there is a way to include the beautiful bars charts on this forum. Please correct me
if there is a way...)

Reply via email to