Rather Bizarre slow downs using Complex!float with avx (ldc).

james.p.leblanc via Digitalmars-d-learn Thu, 30 Sep 2021 09:46:33 -0700

D-Ers,

I have been getting counterintuitive results on avx/no-avx timing
experiments.  Storyline to date (notes at end):

**Experiment #1)** Real float data type (i.e. non-complexnumbers),

speed comparison.

a) moving from non-avx --> avx shows non-realistic speed up of15-25 X.

  b)  this is weird, but story continues ...

**Experiment #2)** Real double data type (non-complex numbers),

a) moving from non-avx --> avx again shows amazing gains, butthegains are about half of those seen in Experiment #1, somaybe

      this looks plausible?

**Experiment #3)**  Complex!float datatypes:

a) now **going from non-avx to avx shows a serious performanceLOSS**

      of 40% to breaking even at best.  What is happening here?

**Experiment #4)**  Complex!double:

a) non-avx --> avx shows performancegains again about 2X (sothe

      gains appear to be reasonable).


The main question I have is:

**"What is going on with the Complex!float performance?"** Onemight expect

floats to have a better perfomance than doubles as we saw with the

real-value data (becuase of vector packaging, memory bandwidth,etc).

But, **Complex!float shows MUCH WORSE avx performance thanComplex!Double

(by a factor of almost 4).**

```d
//            Table of Computation Times
//
//       self math              std math
// explicit  no-explicit   explicit  no-explicit
//   align      align        align      align
//   0.12       0.21          0.15      0.21 ;  # Float with AVX

// 3.23 3.24 3.30 3.22 ; # Float withoutAVX

//   0.31       0.42          0.31      0.42 ;  # Double with AVX

// 3.25 3.24 3.24 3.27 ; # Double withoutAVX// 6.42 6.62 6.61 6.59 ; # Complex!floatwith AVX// 4.04 4.17 6.68 5.82 ; # Complex!floatwithout AVX// 1.67 1.69 1.73 1.71 ; # Complex!doublewith AVX// 3.34 3.42 3.28 3.31 # Complex!doublewithout AVX

```

Notes:

1) Based on forum hints from ldc experts, I got good guidance
   on enabling avx ( i.e. compiling modules on command line, using
   --fast-math and -mcpu=haswell on command line).

2) From Mir-glas experts I received hints to try to implement ownversionof the complex math. (this is what the "self-math" columnrefers to).

I understand that detail of the computations are not includedhere, (Ican do that if there is interest, and if I figure out aneffective way to present

it in a forum.)

But, I thought I might begin with a simple question, **"Is theresome well-knownissue that I am missing here". Have others been done this roadas well?**


Thanks for any and all input.
Best Regards,
James

PS Sorry for the inelegant table ... I do not believe there is awayto include the beautiful bars charts on this forum. Pleasecorrect me

if there is a way...)

Rather Bizarre slow downs using Complex!float with avx (ldc).

Reply via email to