Unfortunately, creating good benchmarks is hard. The benchmark above has some 
subtle faults that lessen its effectiveness.

First off, MyBuffer is a tuple type: 
    
    
    type
      MyBuffer = tuple
        d: array[128, int]
        len: int
    

Tuple types are object types, which means they can be allocated on the stack. 
The only time object types are not allocated on the stack is when the object 
type is part of a reference type.

Because MyBuffer is a tuple, all tx() has to do is allocate ~129 integers worth 
of memory from the stack, which is a simple bump allocation (all the program 
has to do is move the current stack pointer up by X). A better way to test 
Nim's allocator is to compare it with the system malloc implementation.

Below is my version of the above benchmark. I've not tested it for Windows 
users - the worst that might happen is that you get a rather large file called 
'nul' full of numbers.
    
    
    import times, random
    
    proc malloc(size: uint): pointer {.header: "<stdlib.h>", importc: "malloc".}
    proc free(p: pointer) {.header: "<stdlib.h>", importc: "free".}
    
    const bufferSize = 128
    
    type
      MyBuffer = array[bufferSize, int]
    
    proc testStackAllocation(): int =
      var s: MyBuffer
      result = cast[int](addr s)
    
    proc testSequenceAllocation(): int =
      var s = newSeqOfCap[int](bufferSize)
      result = cast[int](addr s)
    
    proc testMallocAllocation(): int =
      var res = malloc(uint(sizeof(MyBuffer)))
      free(res)
      result = cast[int](res)
    
    proc testRandom(): int =
      result = random(7)
    
    
    proc main =
      # Use writing to /dev/null to prevent compiler optimizations
      when defined(posix):
        let nullfh = open("/dev/null", fmReadWrite)
      else:
        let nullfh = open("nul") # untested!
      
      var baseline: float
      
      # Establish a baseline time of a really simple operation + writing to 
stdout
      # This way we can esentially measure how fast stdout can be written to, 
and
      # factor that out of other measurements.
      let z = cpuTime()
      for i in 0..10_000_000:
        nullfh.write(i)
      baseline = cpuTime() - z
      echo "Baseline time:", baseline
      
      # Template to run test procedures.
      template runProc(testProc: typed, testName: string): untyped =
        let t = cpuTime()
        for _ in 0..10_000_000:
          var i = testProc()
          nullfh.write(i)
        echo "Time for ", testName, ": ", (cpuTime() - t) - baseline
      
      # Test:
      #  - Stack allocation (which is usually a bump allocator)
      #  - Malloc allocation (system defined)
      #  - Nim allocation
      #  - Random number generation
      runProc(testStackAllocation, "stack allocation test")
      runProc(testSequenceAllocation, "sequence allocation test")
      runProc(testMallocAllocation, "malloc allocation test")
      runProc(testRandom, "random number generation test")
    
    main()
    

Output using various GC backends (Mac OS Sierra, 2.5 GHz Intel Core i7) : 
    
    
    # nim c -d:release --passC:"-flto" --passL:"-flto" --gc:markAndSweep 
benchmark.nim && ./benchmark
    Baseline time: 1.072926
    Time for stack allocation test: 0.1643110000000001
    Time for sequence allocation test: 1.142116
    Time for malloc allocation test: 0.8133139999999999
    Time for random number generation test: -0.1466420000000002
    
    # nim c -d:release --passC:"-flto" --passL:"-flto" benchmark.nim && 
./benchmark
    Baseline time: 1.053752
    Time for stack allocation test: 0.1456770000000001
    Time for sequence allocation test: 1.227938
    Time for malloc allocation test: 0.8247639999999994
    Time for random number generation test: -0.1094160000000002
    
    # Version of the benchmark with mark and sweep cycle collection disabled
    # nim c -d:release --passC:"-flto" --passL:"-flto" benchmark.nim && 
./benchmark
    Baseline time: 1.075432
    Time for stack allocation test: 0.146002
    Time for sequence allocation test: 1.189319
    Time for malloc allocation test: 0.7623239999999996
    Time for random number generation test: -0.1765280000000002
    

As you can see, Nim's allocator is slower than the system malloc allocator, but 
not by much (I chalk some of this up to the compiler being able to use better 
inlining and intrinsics for malloc). Neither comes close to stack allocation, 
but again, that's expected.

Reply via email to