Performance issues with "complex" module

zio_tom78 Sat, 02 May 2020 00:30:21 -0700

I ran a few benchmarks of C++, Julia, and Nim compilers, and I found that Nim's 
implementation of Complex64 type shows sub-optimal performance.


I have implemented a simple algorithm to compute a [Julia 
set](https://en.wikipedia.org/wiki/Julia_set) , and I have found that the 
simplest implementation is quite slow:
    
    
    import complex
    
    func julia(z: Complex64, c: Complex64, maxiter: int = 256): int =
      var iteridx = 0
      var cur_z = z
      while (abs2(cur_z) < 4) and (iteridx < maxiter):
        cur_z = cur_z * cur_z + c
        iteridx += 1
      
      result = if iteridx == maxiter:
                 -1
               else:
                 iteridx
    
    
    Run

Generating a 800x800 image requires 1.5 s on my laptop, to be compared with 
~0.07 s for an almost identical code in Julia (note the pun!). Inspecting the C 
code produced by Nim shows that the compiler is not trying to inline the call 
to functions like 
[*](https://github.com/nim-lang/Nim/blob/version-1-2/lib/pure/complex.nim#L122).
 So I changed the computation within the `while` loop in this way:
    
    
    # cur_z = cur_z * cur_z + c
    cur_z *= cur_z
    cur_z += c
    
    
    Run

The elapsed time went down to ~1 s: better, but still not good! My 
understanding is that this modification helps the compiler to avoid the 
creation of some temporary Complex64 variables.

Then, I decided to manually unroll the computation of the real and imaginary 
parts of `cur_z`:
    
    
    # cur_z = cur_z * cur_z + c
    let tmp = cur_z.re * cur_z.re - cur_z.im * cur_z.im
    cur_z.im = 2 * cur_z.re * cur_z.im + c.im
    cur_z.re = tmp + c.re
    
    
    Run

The code is much uglier, but now the elapsed time is ~0.2 s!

At this point, I have a few questions:

  1. Is there a way to further improve the code and reach Julia's speed (0.07 
s)?
  2. The last version of the code is surely the fastest, but it's really ugly 
to read! Is there an _elegant_ way to keep the elapsed time down to ~0.1 while 
avoiding the need to manually unroll the calculation of the real and imaginary 
parts? (Maybe using `{.inline.}` everywhere in the complex module?)



The full code of the benchmark, as well as the command-line parameters I used 
to compile it, can be found in this Gist: 
[https://gist.github.com/ziotom78/346fff619dc093d473abfe4cb0b8060c](https://gist.github.com/ziotom78/346fff619dc093d473abfe4cb0b8060c)

Performance issues with "complex" module

Reply via email to